Thanks so much for getting back to me, Chris. I’ve attached the AM log from my most recent attempt to run the hello-samza wikipedia-feed task. I’ve been using pretty small nodes to keep costs down while I test and so forth, so that makes a lot of sense (though I definitely hoped I’d configured appropriate memory ceilings). Here are the values from the YARN UI:
Containers Running: 1 Memory Used: 1 GB Memory Total: 1.76 GB Memory Reserved: 0 B VCores Used: 1 VCores Total: 8 VCores Reserved: 0 Active Nodes: 1 Decommissioned Nodes: 0 Lost Nodes: 0 Unhealthy Nodes: 0 Rebooted Nodes: 0 Again, much obliged for your response. Andrew Sannier On 3/31/15, 3:54 PM, "Chris Riccomini" <criccom...@apache.org> wrote: >Hey Andrew, > >I'm wondering if your YARN cluster doesn't have enough memory to fit both >the AM and its containers. The fact that the AM UI shows no running >containers is suspicious. Can you check these four settings in your YARN >RM's UI: > > Memory Used > Memory Total > Memory Reserved > VCores Used > VCores Total > VCores Reserved > >Can you also attach (or post to gist/pastebin/etc) the YARN AM's full log? > >Cheers, >Chris > >On Tue, Mar 31, 2015 at 2:32 PM, Andrew Sannier ><asann...@helixeducation.com >> wrote: > >> Something to add here: there are a couple of weird things in the Samza >> Application Master web UI: Application master task ID is -1, which seems >> odd, and the Running Containers table is completely empty. How could >>YARN >> call a task “Running” if there’s no container? >> >> Thanks, >> Andrew Sannier >> >> >> >> >> >> On 3/31/15, 2:19 PM, "Andrew Sannier" <asann...@helixeducation.com> >>wrote: >> >> >Hi all - >> > >> >Thanks in advance for your help; I have been totally stuck on this for >>a >> >couple of days. >> > >> >I have a small YARN cluster with one ResourceManager and one >>NodeManager >> >as well as one Zookeeper node and one Kafka node - trying to keep the >> >number of moving parts to a minimum. I¹ve been following the guide to >> >running Samza on YARN >> >>>(https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.htm >>>l >> ) >> >, >> > and I get to the end of the tutorial with a Running job in the YARN >>web >> >UI, as expected. However, the job doesn¹t actually appear to do >>anything - >> >messages are not produced to the ³wikipedia-raw² topic (nor is the >>topic >> >created), and no data is logged at all. >> > >> >To that point, I am having a ton of trouble with Samza¹s logging - in >> >samza.log.dir on the ResourceManager node, there¹s only >>gc.log.0.current, >> >and in the YARN log directory I have only the resourcemanager log >>which of >> >course contains no application information. On the NodeManager side, >> >samza.log.dir contains application-manager.log, which ends at "[INFO] >> >Requesting 1 container(s) with 1700mb of memory² right after the job >> >enters the Running state, it¹s own copy of gc.log.0.current, and stderr >> >and stdout which contain no useful information and also don¹t grow >>after >> >the first second of the job running. In YARN¹s logs, there¹s only the >>node >> >manager log, which has no errors or warnings and just logs the startup >>of >> >the container and then its memory usage from then on, which seems fine: >> > >> >2015-03-31 20:17:34,635 INFO [Container Monitor] >> >monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - >> >Memory usage of ProcessTree 25767 for container-id >> >container_1427823389325_0011_01_000001: 104.9 MB of 1 GB physical >>memory >> >used; 2.4 GB of 3.1 GB virtual memory used >> > >> > >> >What am I missing here? WikipediaFeed.java contains a whole bunch of >> >logging statements, but nothing ever hits any file I can find. Even if >>you >> >can¹t help with the problem I¹m having with hello-samza, I would >>greatly >> >appreciate any advice on how I can get useful logs from Samza jobs. >> > >> >I¹ve checked that I can ping the Wikipedia IRC URL and consume >> >from/produce to the Kafka cluster with the console shell scripts from >>both >> >the ResourceManager and NodeManager nodes, and other applications can >>work >> >with my Kafka and Zookeeper with no issues. From the application-master >> >log on the worker node, all I can see is that Samza configures the >> >Wikipedia IRC system, starts the Webapp, and requests a container. It >> >enters the Running state with YARN, after which point nothing happens >>at >> >all. There¹s no activity at all in the Kafka or Zookeeper logs. >> > >> >And that¹s it; the job will run for hours if I let it but at no point >>is >> >anything produced to Kafka or logged at all. I wrote a simpler task >>that >> >just accepts a json message from a topic on Kafka, adds a timestamp, >>and >> >produces to another topic, but almost nothing is different. From >> >application-master log: >> > >> >2015-03-31 20:07:05 ClientUtils$ [INFO] Fetching metadata from broker >> >id:0,host:172.31.2.19,port:9092 with correlation id 0 for 1 topic(s) >> >Set(test) >> >2015-03-31 20:07:05 SyncProducer [INFO] Connected to 172.31.2.19:9092 >>for >> >producing >> >2015-03-31 20:07:05 SyncProducer [INFO] Disconnecting from >> >172.31.2.19:9092 >> >2015-03-31 20:07:06 KafkaSystemAdmin$ [INFO] Got metadata: Map(test -> >> >SystemStreamMetadata [streamName=test, partitionMetadata={Partition >> >[partition=0]=SystemStreamPartitionMetadata [oldestOffset=0, >> >newestOffset=4, upcomingOffset=5], Partition >> >[partition=1]=SystemStreamPartitionMetadata [oldestOffset=null, >> >newestOffset=null, upcomingOffset=0]}]) >> > >> > >> >which all looks correct. Then it connects to ResourceManager, starts >>the >> >Webapp, Requests a container and starts running. All I see in Kafka¹s >>log >> >is >> > >> >[2015-03-31 20:07:05,999] INFO Closing socket connection to >>/172.31.1.229 >> . >> >(kafka.network.Processor) >> >[2015-03-31 20:07:06,090] INFO Closing socket connection to >>/172.31.1.229 >> . >> >(kafka.network.Processor) >> > >> > >> >and Zookeeper has nothing to say at all. As before, no new topic is >> >created. >> > >> >So a huge part of this question is just, what am I missing about >>logging? >> >Where are the actual job/task-level logs? Aside from that, I just have >>no >> >explanation for why nothing is happening in either of these simple >>tasks. >> >I would really appreciate any insight anyone can offerŠ >> > >> >Oh, one more thing - there was an error message in Zookeeper after >> >submitting my simple StreamTask that I haven¹t been able to reproduce: >> > >> >2015-03-31 19:48:28,145 [myid:] - INFO >> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - >> >Accepted socket connection from /172.31.2.19:41801 >> >2015-03-31 19:48:28,147 [myid:] - WARN >> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - >> >Connection request from old client /172.31.2.19:41801; will be dropped >>if >> >server is in r-o mode >> >2015-03-31 19:48:28,148 [myid:] - INFO >> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - >>Client >> >attempting to establish new session at /172.31.2.19:41801 >> >2015-03-31 19:48:28,149 [myid:] - INFO >>[SyncThread:0:ZooKeeperServer@617 >> ] >> >- Established session 0x14c70bd0c3e0006 with negotiated timeout 30000 >>for >> >client /172.31.2.19:41801 >> >2015-03-31 19:48:28,202 [myid:] - INFO [ProcessThread(sid:0 >> >cport:-1)::PrepRequestProcessor@494] - Processed session termination >>for >> >sessionid: 0x14c70bd0c3e0006 >> >2015-03-31 19:48:28,206 [myid:] - INFO >> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed >> >socket connection for client /172.31.2.19:41801 which had sessionid >> >0x14c70bd0c3e0006 >> > >> > >> >172.31.2.19 is the Kafka broker. The job continued unphased; Samza >>didn¹t >> >log anything about this socket being closed or any kind of error. Not >>sure >> >if that¹s related. >> > >> > >> >Again, thanks a ton for reading and whatever help you can offer. >> > >> >Andrew Sannier >> >Software Engineer, Big Data >> >C: 480-284-1048 >> >www.helixeducation.com >> > >> > >> >>