Hey Andrew,

I'm wondering if your YARN cluster doesn't have enough memory to fit both
the AM and its containers. The fact that the AM UI shows no running
containers is suspicious. Can you check these four settings in your YARN
RM's UI:

  Memory Used
  Memory Total
  Memory Reserved
  VCores Used
  VCores Total
  VCores Reserved

Can you also attach (or post to gist/pastebin/etc) the YARN AM's full log?

Cheers,
Chris

On Tue, Mar 31, 2015 at 2:32 PM, Andrew Sannier <asann...@helixeducation.com
> wrote:

> Something to add here: there are a couple of weird things in the Samza
> Application Master web UI: Application master task ID is -1, which seems
> odd, and the Running Containers table is completely empty. How could YARN
> call a task “Running” if there’s no container?
>
> Thanks,
> Andrew Sannier
>
>
>
>
>
> On 3/31/15, 2:19 PM, "Andrew Sannier" <asann...@helixeducation.com> wrote:
>
> >Hi all -
> >
> >Thanks in advance for your help; I have been totally stuck on this for a
> >couple of days.
> >
> >I have a small YARN cluster with one ResourceManager and one NodeManager
> >as well as one Zookeeper node and one Kafka node - trying to keep the
> >number of moving parts to a minimum. I¹ve been following the guide to
> >running Samza on YARN
> >(https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html
> )
> >,
> > and I get to the end of the tutorial with a Running job in the YARN web
> >UI, as expected. However, the job doesn¹t actually appear to do anything -
> >messages are not produced to the ³wikipedia-raw² topic (nor is the topic
> >created), and no data is logged at all.
> >
> >To that point, I am having a ton of trouble with Samza¹s logging - in
> >samza.log.dir on the ResourceManager node, there¹s only gc.log.0.current,
> >and in the YARN log directory I have only the resourcemanager log which of
> >course contains no application information. On the NodeManager side,
> >samza.log.dir contains application-manager.log, which ends at "[INFO]
> >Requesting 1 container(s) with 1700mb of memory² right after the job
> >enters the Running state, it¹s own copy of gc.log.0.current, and stderr
> >and stdout which contain no useful information and also don¹t grow after
> >the first second of the job running. In YARN¹s logs, there¹s only the node
> >manager log, which has no errors or warnings and just logs the startup of
> >the container and then its memory usage from then on, which seems fine:
> >
> >2015-03-31 20:17:34,635 INFO  [Container Monitor]
> >monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) -
> >Memory usage of ProcessTree 25767 for container-id
> >container_1427823389325_0011_01_000001: 104.9 MB of 1 GB physical memory
> >used; 2.4 GB of 3.1 GB virtual memory used
> >
> >
> >What am I missing here? WikipediaFeed.java contains a whole bunch of
> >logging statements, but nothing ever hits any file I can find. Even if you
> >can¹t help with the problem I¹m having with hello-samza, I would greatly
> >appreciate any advice on how I can get useful logs from Samza jobs.
> >
> >I¹ve checked that I can ping the Wikipedia IRC URL and consume
> >from/produce to the Kafka cluster with the console shell scripts from both
> >the ResourceManager and NodeManager nodes, and other applications can work
> >with my Kafka and Zookeeper with no issues. From the application-master
> >log on the worker node, all I can see is that Samza configures the
> >Wikipedia IRC system, starts the Webapp, and requests a container. It
> >enters the Running state with YARN, after which point nothing happens at
> >all. There¹s no activity at all in the Kafka or Zookeeper logs.
> >
> >And that¹s it; the job will run for hours if I let it but at no point is
> >anything produced to Kafka or logged at all. I wrote a simpler task that
> >just accepts a json message from a topic on Kafka, adds a timestamp, and
> >produces to another topic, but almost nothing is different. From
> >application-master log:
> >
> >2015-03-31 20:07:05 ClientUtils$ [INFO] Fetching metadata from broker
> >id:0,host:172.31.2.19,port:9092 with correlation id 0 for 1 topic(s)
> >Set(test)
> >2015-03-31 20:07:05 SyncProducer [INFO] Connected to 172.31.2.19:9092 for
> >producing
> >2015-03-31 20:07:05 SyncProducer [INFO] Disconnecting from
> >172.31.2.19:9092
> >2015-03-31 20:07:06 KafkaSystemAdmin$ [INFO] Got metadata: Map(test ->
> >SystemStreamMetadata [streamName=test, partitionMetadata={Partition
> >[partition=0]=SystemStreamPartitionMetadata [oldestOffset=0,
> >newestOffset=4, upcomingOffset=5], Partition
> >[partition=1]=SystemStreamPartitionMetadata [oldestOffset=null,
> >newestOffset=null, upcomingOffset=0]}])
> >
> >
> >which all looks correct. Then it connects to ResourceManager, starts the
> >Webapp, Requests a container and starts running. All I see in Kafka¹s log
> >is
> >
> >[2015-03-31 20:07:05,999] INFO Closing socket connection to /172.31.1.229
> .
> >(kafka.network.Processor)
> >[2015-03-31 20:07:06,090] INFO Closing socket connection to /172.31.1.229
> .
> >(kafka.network.Processor)
> >
> >
> >and Zookeeper has nothing to say at all. As before, no new topic is
> >created.
> >
> >So a huge part of this question is just, what am I missing about logging?
> >Where are the actual job/task-level logs? Aside from that, I just have no
> >explanation for why nothing is happening in either of these simple tasks.
> >I would really appreciate any insight anyone can offerŠ
> >
> >Oh, one more thing - there was an error message in Zookeeper after
> >submitting my simple StreamTask that I haven¹t been able to reproduce:
> >
> >2015-03-31 19:48:28,145 [myid:] - INFO
> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] -
> >Accepted socket connection from /172.31.2.19:41801
> >2015-03-31 19:48:28,147 [myid:] - WARN
> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] -
> >Connection request from old client /172.31.2.19:41801; will be dropped if
> >server is in r-o mode
> >2015-03-31 19:48:28,148 [myid:] - INFO
> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client
> >attempting to establish new session at /172.31.2.19:41801
> >2015-03-31 19:48:28,149 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617
> ]
> >- Established session 0x14c70bd0c3e0006 with negotiated timeout 30000 for
> >client /172.31.2.19:41801
> >2015-03-31 19:48:28,202 [myid:] - INFO  [ProcessThread(sid:0
> >cport:-1)::PrepRequestProcessor@494] - Processed session termination for
> >sessionid: 0x14c70bd0c3e0006
> >2015-03-31 19:48:28,206 [myid:] - INFO
> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed
> >socket connection for client /172.31.2.19:41801 which had sessionid
> >0x14c70bd0c3e0006
> >
> >
> >172.31.2.19 is the Kafka broker. The job continued unphased; Samza didn¹t
> >log anything about this socket being closed or any kind of error. Not sure
> >if that¹s related.
> >
> >
> >Again, thanks a ton for reading and whatever help you can offer.
> >
> >Andrew Sannier
> >Software Engineer, Big Data
> >C: 480-284-1048
> >www.helixeducation.com
> >
> >
>
>

Reply via email to