Something to add here: there are a couple of weird things in the Samza Application Master web UI: Application master task ID is -1, which seems odd, and the Running Containers table is completely empty. How could YARN call a task “Running” if there’s no container?
Thanks, Andrew Sannier On 3/31/15, 2:19 PM, "Andrew Sannier" <asann...@helixeducation.com> wrote: >Hi all - > >Thanks in advance for your help; I have been totally stuck on this for a >couple of days. > >I have a small YARN cluster with one ResourceManager and one NodeManager >as well as one Zookeeper node and one Kafka node - trying to keep the >number of moving parts to a minimum. I¹ve been following the guide to >running Samza on YARN >(https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html) >, > and I get to the end of the tutorial with a Running job in the YARN web >UI, as expected. However, the job doesn¹t actually appear to do anything - >messages are not produced to the ³wikipedia-raw² topic (nor is the topic >created), and no data is logged at all. > >To that point, I am having a ton of trouble with Samza¹s logging - in >samza.log.dir on the ResourceManager node, there¹s only gc.log.0.current, >and in the YARN log directory I have only the resourcemanager log which of >course contains no application information. On the NodeManager side, >samza.log.dir contains application-manager.log, which ends at "[INFO] >Requesting 1 container(s) with 1700mb of memory² right after the job >enters the Running state, it¹s own copy of gc.log.0.current, and stderr >and stdout which contain no useful information and also don¹t grow after >the first second of the job running. In YARN¹s logs, there¹s only the node >manager log, which has no errors or warnings and just logs the startup of >the container and then its memory usage from then on, which seems fine: > >2015-03-31 20:17:34,635 INFO [Container Monitor] >monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - >Memory usage of ProcessTree 25767 for container-id >container_1427823389325_0011_01_000001: 104.9 MB of 1 GB physical memory >used; 2.4 GB of 3.1 GB virtual memory used > > >What am I missing here? WikipediaFeed.java contains a whole bunch of >logging statements, but nothing ever hits any file I can find. Even if you >can¹t help with the problem I¹m having with hello-samza, I would greatly >appreciate any advice on how I can get useful logs from Samza jobs. > >I¹ve checked that I can ping the Wikipedia IRC URL and consume >from/produce to the Kafka cluster with the console shell scripts from both >the ResourceManager and NodeManager nodes, and other applications can work >with my Kafka and Zookeeper with no issues. From the application-master >log on the worker node, all I can see is that Samza configures the >Wikipedia IRC system, starts the Webapp, and requests a container. It >enters the Running state with YARN, after which point nothing happens at >all. There¹s no activity at all in the Kafka or Zookeeper logs. > >And that¹s it; the job will run for hours if I let it but at no point is >anything produced to Kafka or logged at all. I wrote a simpler task that >just accepts a json message from a topic on Kafka, adds a timestamp, and >produces to another topic, but almost nothing is different. From >application-master log: > >2015-03-31 20:07:05 ClientUtils$ [INFO] Fetching metadata from broker >id:0,host:172.31.2.19,port:9092 with correlation id 0 for 1 topic(s) >Set(test) >2015-03-31 20:07:05 SyncProducer [INFO] Connected to 172.31.2.19:9092 for >producing >2015-03-31 20:07:05 SyncProducer [INFO] Disconnecting from >172.31.2.19:9092 >2015-03-31 20:07:06 KafkaSystemAdmin$ [INFO] Got metadata: Map(test -> >SystemStreamMetadata [streamName=test, partitionMetadata={Partition >[partition=0]=SystemStreamPartitionMetadata [oldestOffset=0, >newestOffset=4, upcomingOffset=5], Partition >[partition=1]=SystemStreamPartitionMetadata [oldestOffset=null, >newestOffset=null, upcomingOffset=0]}]) > > >which all looks correct. Then it connects to ResourceManager, starts the >Webapp, Requests a container and starts running. All I see in Kafka¹s log >is > >[2015-03-31 20:07:05,999] INFO Closing socket connection to /172.31.1.229. >(kafka.network.Processor) >[2015-03-31 20:07:06,090] INFO Closing socket connection to /172.31.1.229. >(kafka.network.Processor) > > >and Zookeeper has nothing to say at all. As before, no new topic is >created. > >So a huge part of this question is just, what am I missing about logging? >Where are the actual job/task-level logs? Aside from that, I just have no >explanation for why nothing is happening in either of these simple tasks. >I would really appreciate any insight anyone can offerŠ > >Oh, one more thing - there was an error message in Zookeeper after >submitting my simple StreamTask that I haven¹t been able to reproduce: > >2015-03-31 19:48:28,145 [myid:] - INFO >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - >Accepted socket connection from /172.31.2.19:41801 >2015-03-31 19:48:28,147 [myid:] - WARN >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - >Connection request from old client /172.31.2.19:41801; will be dropped if >server is in r-o mode >2015-03-31 19:48:28,148 [myid:] - INFO >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client >attempting to establish new session at /172.31.2.19:41801 >2015-03-31 19:48:28,149 [myid:] - INFO [SyncThread:0:ZooKeeperServer@617] >- Established session 0x14c70bd0c3e0006 with negotiated timeout 30000 for >client /172.31.2.19:41801 >2015-03-31 19:48:28,202 [myid:] - INFO [ProcessThread(sid:0 >cport:-1)::PrepRequestProcessor@494] - Processed session termination for >sessionid: 0x14c70bd0c3e0006 >2015-03-31 19:48:28,206 [myid:] - INFO >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed >socket connection for client /172.31.2.19:41801 which had sessionid >0x14c70bd0c3e0006 > > >172.31.2.19 is the Kafka broker. The job continued unphased; Samza didn¹t >log anything about this socket being closed or any kind of error. Not sure >if that¹s related. > > >Again, thanks a ton for reading and whatever help you can offer. > >Andrew Sannier >Software Engineer, Big Data >C: 480-284-1048 >www.helixeducation.com > >