Something to add here: there are a couple of weird things in the Samza
Application Master web UI: Application master task ID is -1, which seems
odd, and the Running Containers table is completely empty. How could YARN
call a task “Running” if there’s no container?

Thanks,
Andrew Sannier





On 3/31/15, 2:19 PM, "Andrew Sannier" <asann...@helixeducation.com> wrote:

>Hi all -
>
>Thanks in advance for your help; I have been totally stuck on this for a
>couple of days.
>
>I have a small YARN cluster with one ResourceManager and one NodeManager
>as well as one Zookeeper node and one Kafka node - trying to keep the
>number of moving parts to a minimum. I¹ve been following the guide to
>running Samza on YARN
>(https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html)
>,
> and I get to the end of the tutorial with a Running job in the YARN web
>UI, as expected. However, the job doesn¹t actually appear to do anything -
>messages are not produced to the ³wikipedia-raw² topic (nor is the topic
>created), and no data is logged at all.
>
>To that point, I am having a ton of trouble with Samza¹s logging - in
>samza.log.dir on the ResourceManager node, there¹s only gc.log.0.current,
>and in the YARN log directory I have only the resourcemanager log which of
>course contains no application information. On the NodeManager side,
>samza.log.dir contains application-manager.log, which ends at "[INFO]
>Requesting 1 container(s) with 1700mb of memory² right after the job
>enters the Running state, it¹s own copy of gc.log.0.current, and stderr
>and stdout which contain no useful information and also don¹t grow after
>the first second of the job running. In YARN¹s logs, there¹s only the node
>manager log, which has no errors or warnings and just logs the startup of
>the container and then its memory usage from then on, which seems fine:
>
>2015-03-31 20:17:34,635 INFO  [Container Monitor]
>monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) -
>Memory usage of ProcessTree 25767 for container-id
>container_1427823389325_0011_01_000001: 104.9 MB of 1 GB physical memory
>used; 2.4 GB of 3.1 GB virtual memory used
>
>
>What am I missing here? WikipediaFeed.java contains a whole bunch of
>logging statements, but nothing ever hits any file I can find. Even if you
>can¹t help with the problem I¹m having with hello-samza, I would greatly
>appreciate any advice on how I can get useful logs from Samza jobs.
>
>I¹ve checked that I can ping the Wikipedia IRC URL and consume
>from/produce to the Kafka cluster with the console shell scripts from both
>the ResourceManager and NodeManager nodes, and other applications can work
>with my Kafka and Zookeeper with no issues. From the application-master
>log on the worker node, all I can see is that Samza configures the
>Wikipedia IRC system, starts the Webapp, and requests a container. It
>enters the Running state with YARN, after which point nothing happens at
>all. There¹s no activity at all in the Kafka or Zookeeper logs.
>
>And that¹s it; the job will run for hours if I let it but at no point is
>anything produced to Kafka or logged at all. I wrote a simpler task that
>just accepts a json message from a topic on Kafka, adds a timestamp, and
>produces to another topic, but almost nothing is different. From
>application-master log:
>
>2015-03-31 20:07:05 ClientUtils$ [INFO] Fetching metadata from broker
>id:0,host:172.31.2.19,port:9092 with correlation id 0 for 1 topic(s)
>Set(test)
>2015-03-31 20:07:05 SyncProducer [INFO] Connected to 172.31.2.19:9092 for
>producing
>2015-03-31 20:07:05 SyncProducer [INFO] Disconnecting from
>172.31.2.19:9092
>2015-03-31 20:07:06 KafkaSystemAdmin$ [INFO] Got metadata: Map(test ->
>SystemStreamMetadata [streamName=test, partitionMetadata={Partition
>[partition=0]=SystemStreamPartitionMetadata [oldestOffset=0,
>newestOffset=4, upcomingOffset=5], Partition
>[partition=1]=SystemStreamPartitionMetadata [oldestOffset=null,
>newestOffset=null, upcomingOffset=0]}])
>
>
>which all looks correct. Then it connects to ResourceManager, starts the
>Webapp, Requests a container and starts running. All I see in Kafka¹s log
>is
>
>[2015-03-31 20:07:05,999] INFO Closing socket connection to /172.31.1.229.
>(kafka.network.Processor)
>[2015-03-31 20:07:06,090] INFO Closing socket connection to /172.31.1.229.
>(kafka.network.Processor)
>
>
>and Zookeeper has nothing to say at all. As before, no new topic is
>created.
>
>So a huge part of this question is just, what am I missing about logging?
>Where are the actual job/task-level logs? Aside from that, I just have no
>explanation for why nothing is happening in either of these simple tasks.
>I would really appreciate any insight anyone can offerŠ
>
>Oh, one more thing - there was an error message in Zookeeper after
>submitting my simple StreamTask that I haven¹t been able to reproduce:
>
>2015-03-31 19:48:28,145 [myid:] - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] -
>Accepted socket connection from /172.31.2.19:41801
>2015-03-31 19:48:28,147 [myid:] - WARN
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] -
>Connection request from old client /172.31.2.19:41801; will be dropped if
>server is in r-o mode
>2015-03-31 19:48:28,148 [myid:] - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client
>attempting to establish new session at /172.31.2.19:41801
>2015-03-31 19:48:28,149 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617]
>- Established session 0x14c70bd0c3e0006 with negotiated timeout 30000 for
>client /172.31.2.19:41801
>2015-03-31 19:48:28,202 [myid:] - INFO  [ProcessThread(sid:0
>cport:-1)::PrepRequestProcessor@494] - Processed session termination for
>sessionid: 0x14c70bd0c3e0006
>2015-03-31 19:48:28,206 [myid:] - INFO
>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed
>socket connection for client /172.31.2.19:41801 which had sessionid
>0x14c70bd0c3e0006
>
>
>172.31.2.19 is the Kafka broker. The job continued unphased; Samza didn¹t
>log anything about this socket being closed or any kind of error. Not sure
>if that¹s related.
>
>
>Again, thanks a ton for reading and whatever help you can offer.
>
>Andrew Sannier
>Software Engineer, Big Data
>C: 480-284-1048
>www.helixeducation.com
>
>

Reply via email to