Hi all - Thanks in advance for your help; I have been totally stuck on this for a couple of days.
I have a small YARN cluster with one ResourceManager and one NodeManager as well as one Zookeeper node and one Kafka node - trying to keep the number of moving parts to a minimum. I¹ve been following the guide to running Samza on YARN (https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html), and I get to the end of the tutorial with a Running job in the YARN web UI, as expected. However, the job doesn¹t actually appear to do anything - messages are not produced to the ³wikipedia-raw² topic (nor is the topic created), and no data is logged at all. To that point, I am having a ton of trouble with Samza¹s logging - in samza.log.dir on the ResourceManager node, there¹s only gc.log.0.current, and in the YARN log directory I have only the resourcemanager log which of course contains no application information. On the NodeManager side, samza.log.dir contains application-manager.log, which ends at "[INFO] Requesting 1 container(s) with 1700mb of memory² right after the job enters the Running state, it¹s own copy of gc.log.0.current, and stderr and stdout which contain no useful information and also don¹t grow after the first second of the job running. In YARN¹s logs, there¹s only the node manager log, which has no errors or warnings and just logs the startup of the container and then its memory usage from then on, which seems fine: 2015-03-31 20:17:34,635 INFO [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 25767 for container-id container_1427823389325_0011_01_000001: 104.9 MB of 1 GB physical memory used; 2.4 GB of 3.1 GB virtual memory used What am I missing here? WikipediaFeed.java contains a whole bunch of logging statements, but nothing ever hits any file I can find. Even if you can¹t help with the problem I¹m having with hello-samza, I would greatly appreciate any advice on how I can get useful logs from Samza jobs. I¹ve checked that I can ping the Wikipedia IRC URL and consume from/produce to the Kafka cluster with the console shell scripts from both the ResourceManager and NodeManager nodes, and other applications can work with my Kafka and Zookeeper with no issues. From the application-master log on the worker node, all I can see is that Samza configures the Wikipedia IRC system, starts the Webapp, and requests a container. It enters the Running state with YARN, after which point nothing happens at all. There¹s no activity at all in the Kafka or Zookeeper logs. And that¹s it; the job will run for hours if I let it but at no point is anything produced to Kafka or logged at all. I wrote a simpler task that just accepts a json message from a topic on Kafka, adds a timestamp, and produces to another topic, but almost nothing is different. From application-master log: 2015-03-31 20:07:05 ClientUtils$ [INFO] Fetching metadata from broker id:0,host:172.31.2.19,port:9092 with correlation id 0 for 1 topic(s) Set(test) 2015-03-31 20:07:05 SyncProducer [INFO] Connected to 172.31.2.19:9092 for producing 2015-03-31 20:07:05 SyncProducer [INFO] Disconnecting from 172.31.2.19:9092 2015-03-31 20:07:06 KafkaSystemAdmin$ [INFO] Got metadata: Map(test -> SystemStreamMetadata [streamName=test, partitionMetadata={Partition [partition=0]=SystemStreamPartitionMetadata [oldestOffset=0, newestOffset=4, upcomingOffset=5], Partition [partition=1]=SystemStreamPartitionMetadata [oldestOffset=null, newestOffset=null, upcomingOffset=0]}]) which all looks correct. Then it connects to ResourceManager, starts the Webapp, Requests a container and starts running. All I see in Kafka¹s log is [2015-03-31 20:07:05,999] INFO Closing socket connection to /172.31.1.229. (kafka.network.Processor) [2015-03-31 20:07:06,090] INFO Closing socket connection to /172.31.1.229. (kafka.network.Processor) and Zookeeper has nothing to say at all. As before, no new topic is created. So a huge part of this question is just, what am I missing about logging? Where are the actual job/task-level logs? Aside from that, I just have no explanation for why nothing is happening in either of these simple tasks. I would really appreciate any insight anyone can offerŠ Oh, one more thing - there was an error message in Zookeeper after submitting my simple StreamTask that I haven¹t been able to reproduce: 2015-03-31 19:48:28,145 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /172.31.2.19:41801 2015-03-31 19:48:28,147 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /172.31.2.19:41801; will be dropped if server is in r-o mode 2015-03-31 19:48:28,148 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /172.31.2.19:41801 2015-03-31 19:48:28,149 [myid:] - INFO [SyncThread:0:ZooKeeperServer@617] - Established session 0x14c70bd0c3e0006 with negotiated timeout 30000 for client /172.31.2.19:41801 2015-03-31 19:48:28,202 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x14c70bd0c3e0006 2015-03-31 19:48:28,206 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /172.31.2.19:41801 which had sessionid 0x14c70bd0c3e0006 172.31.2.19 is the Kafka broker. The job continued unphased; Samza didn¹t log anything about this socket being closed or any kind of error. Not sure if that¹s related. Again, thanks a ton for reading and whatever help you can offer. Andrew Sannier Software Engineer, Big Data C: 480-284-1048 www.helixeducation.com