Chris - Wulp, now I feel like a moron, but at least things are running now! Thanks a lot for helping me diagnose that. On to the next problem...
Andrew Sannier On 3/31/15, 4:21 PM, "Chris Riccomini" <criccom...@apache.org> wrote: >Hey Andrew, > >It looks like your attachment was stripped by Apache's mailing server. >Looking at the info you pasted, I can tell you that YARN is most likely >unable to provision your containers due to space constraint. Here's the >issue: > >Memory Used: 1 GB >Memory Total: 1.76 GB > >The YARN AM and YARN container both take 1G. Your AM is requesting a 1G >container, which YARN then queues up, and waits for 1G of space to become >available. Because you only have 760MB left on the node, this will never >happen. The AM (and YARN) will just sit and wait for more resources. > >To test this theory, try setting: > >yarn.am.container.memory.mb=512 >yarn.container.memory.mb=512 > >The first config sets the AM container's memory to 512MB. The second >config >sets the SamzaContainer's container to 512MB. Both of these should fit on >a >1.76G node. > >Thanks! >Chris > >On Tue, Mar 31, 2015 at 2:59 PM, Andrew Sannier ><asann...@helixeducation.com >> wrote: > >> Thanks so much for getting back to me, Chris. >> >> I’ve attached the AM log from my most recent attempt to run the >> hello-samza wikipedia-feed task. I’ve been using pretty small nodes to >> keep costs down while I test and so forth, so that makes a lot of sense >> (though I definitely hoped I’d configured appropriate memory ceilings). >> Here are the values from the YARN UI: >> >> Containers Running: 1 >> Memory Used: 1 GB >> Memory Total: 1.76 GB >> Memory Reserved: 0 B >> VCores Used: 1 >> VCores Total: 8 >> VCores Reserved: 0 >> Active Nodes: 1 >> Decommissioned Nodes: 0 >> Lost Nodes: 0 >> Unhealthy Nodes: 0 >> Rebooted Nodes: 0 >> >> >> Again, much obliged for your response. >> >> Andrew Sannier >> >> >> >> On 3/31/15, 3:54 PM, "Chris Riccomini" <criccom...@apache.org> wrote: >> >> >Hey Andrew, >> > >> >I'm wondering if your YARN cluster doesn't have enough memory to fit >>both >> >the AM and its containers. The fact that the AM UI shows no running >> >containers is suspicious. Can you check these four settings in your >>YARN >> >RM's UI: >> > >> > Memory Used >> > Memory Total >> > Memory Reserved >> > VCores Used >> > VCores Total >> > VCores Reserved >> > >> >Can you also attach (or post to gist/pastebin/etc) the YARN AM's full >>log? >> > >> >Cheers, >> >Chris >> > >> >On Tue, Mar 31, 2015 at 2:32 PM, Andrew Sannier >> ><asann...@helixeducation.com >> >> wrote: >> > >> >> Something to add here: there are a couple of weird things in the >>Samza >> >> Application Master web UI: Application master task ID is -1, which >>seems >> >> odd, and the Running Containers table is completely empty. How could >> >>YARN >> >> call a task “Running” if there’s no container? >> >> >> >> Thanks, >> >> Andrew Sannier >> >> >> >> >> >> >> >> >> >> >> >> On 3/31/15, 2:19 PM, "Andrew Sannier" <asann...@helixeducation.com> >> >>wrote: >> >> >> >> >Hi all - >> >> > >> >> >Thanks in advance for your help; I have been totally stuck on this >>for >> >>a >> >> >couple of days. >> >> > >> >> >I have a small YARN cluster with one ResourceManager and one >> >>NodeManager >> >> >as well as one Zookeeper node and one Kafka node - trying to keep >>the >> >> >number of moving parts to a minimum. I¹ve been following the guide >>to >> >> >running Samza on YARN >> >> >> >>>( >> https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.htm >> >>>l >> >> ) >> >> >, >> >> > and I get to the end of the tutorial with a Running job in the YARN >> >>web >> >> >UI, as expected. However, the job doesn¹t actually appear to do >> >>anything - >> >> >messages are not produced to the ³wikipedia-raw² topic (nor is the >> >>topic >> >> >created), and no data is logged at all. >> >> > >> >> >To that point, I am having a ton of trouble with Samza¹s logging - >>in >> >> >samza.log.dir on the ResourceManager node, there¹s only >> >>gc.log.0.current, >> >> >and in the YARN log directory I have only the resourcemanager log >> >>which of >> >> >course contains no application information. On the NodeManager side, >> >> >samza.log.dir contains application-manager.log, which ends at >>"[INFO] >> >> >Requesting 1 container(s) with 1700mb of memory² right after the job >> >> >enters the Running state, it¹s own copy of gc.log.0.current, and >>stderr >> >> >and stdout which contain no useful information and also don¹t grow >> >>after >> >> >the first second of the job running. In YARN¹s logs, there¹s only >>the >> >>node >> >> >manager log, which has no errors or warnings and just logs the >>startup >> >>of >> >> >the container and then its memory usage from then on, which seems >>fine: >> >> > >> >> >2015-03-31 20:17:34,635 INFO [Container Monitor] >> >> >monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) >>- >> >> >Memory usage of ProcessTree 25767 for container-id >> >> >container_1427823389325_0011_01_000001: 104.9 MB of 1 GB physical >> >>memory >> >> >used; 2.4 GB of 3.1 GB virtual memory used >> >> > >> >> > >> >> >What am I missing here? WikipediaFeed.java contains a whole bunch of >> >> >logging statements, but nothing ever hits any file I can find. Even >>if >> >>you >> >> >can¹t help with the problem I¹m having with hello-samza, I would >> >>greatly >> >> >appreciate any advice on how I can get useful logs from Samza jobs. >> >> > >> >> >I¹ve checked that I can ping the Wikipedia IRC URL and consume >> >> >from/produce to the Kafka cluster with the console shell scripts >>from >> >>both >> >> >the ResourceManager and NodeManager nodes, and other applications >>can >> >>work >> >> >with my Kafka and Zookeeper with no issues. From the >>application-master >> >> >log on the worker node, all I can see is that Samza configures the >> >> >Wikipedia IRC system, starts the Webapp, and requests a container. >>It >> >> >enters the Running state with YARN, after which point nothing >>happens >> >>at >> >> >all. There¹s no activity at all in the Kafka or Zookeeper logs. >> >> > >> >> >And that¹s it; the job will run for hours if I let it but at no >>point >> >>is >> >> >anything produced to Kafka or logged at all. I wrote a simpler task >> >>that >> >> >just accepts a json message from a topic on Kafka, adds a timestamp, >> >>and >> >> >produces to another topic, but almost nothing is different. From >> >> >application-master log: >> >> > >> >> >2015-03-31 20:07:05 ClientUtils$ [INFO] Fetching metadata from >>broker >> >> >id:0,host:172.31.2.19,port:9092 with correlation id 0 for 1 topic(s) >> >> >Set(test) >> >> >2015-03-31 20:07:05 SyncProducer [INFO] Connected to >>172.31.2.19:9092 >> >>for >> >> >producing >> >> >2015-03-31 20:07:05 SyncProducer [INFO] Disconnecting from >> >> >172.31.2.19:9092 >> >> >2015-03-31 20:07:06 KafkaSystemAdmin$ [INFO] Got metadata: Map(test >>-> >> >> >SystemStreamMetadata [streamName=test, partitionMetadata={Partition >> >> >[partition=0]=SystemStreamPartitionMetadata [oldestOffset=0, >> >> >newestOffset=4, upcomingOffset=5], Partition >> >> >[partition=1]=SystemStreamPartitionMetadata [oldestOffset=null, >> >> >newestOffset=null, upcomingOffset=0]}]) >> >> > >> >> > >> >> >which all looks correct. Then it connects to ResourceManager, starts >> >>the >> >> >Webapp, Requests a container and starts running. All I see in >>Kafka¹s >> >>log >> >> >is >> >> > >> >> >[2015-03-31 20:07:05,999] INFO Closing socket connection to >> >>/172.31.1.229 >> >> . >> >> >(kafka.network.Processor) >> >> >[2015-03-31 20:07:06,090] INFO Closing socket connection to >> >>/172.31.1.229 >> >> . >> >> >(kafka.network.Processor) >> >> > >> >> > >> >> >and Zookeeper has nothing to say at all. As before, no new topic is >> >> >created. >> >> > >> >> >So a huge part of this question is just, what am I missing about >> >>logging? >> >> >Where are the actual job/task-level logs? Aside from that, I just >>have >> >>no >> >> >explanation for why nothing is happening in either of these simple >> >>tasks. >> >> >I would really appreciate any insight anyone can offerŠ >> >> > >> >> >Oh, one more thing - there was an error message in Zookeeper after >> >> >submitting my simple StreamTask that I haven¹t been able to >>reproduce: >> >> > >> >> >2015-03-31 19:48:28,145 [myid:] - INFO >> >> >>>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - >> >> >Accepted socket connection from /172.31.2.19:41801 >> >> >2015-03-31 19:48:28,147 [myid:] - WARN >> >> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - >> >> >Connection request from old client /172.31.2.19:41801; will be >>dropped >> >>if >> >> >server is in r-o mode >> >> >2015-03-31 19:48:28,148 [myid:] - INFO >> >> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - >> >>Client >> >> >attempting to establish new session at /172.31.2.19:41801 >> >> >2015-03-31 19:48:28,149 [myid:] - INFO >> >>[SyncThread:0:ZooKeeperServer@617 >> >> ] >> >> >- Established session 0x14c70bd0c3e0006 with negotiated timeout >>30000 >> >>for >> >> >client /172.31.2.19:41801 >> >> >2015-03-31 19:48:28,202 [myid:] - INFO [ProcessThread(sid:0 >> >> >cport:-1)::PrepRequestProcessor@494] - Processed session termination >> >>for >> >> >sessionid: 0x14c70bd0c3e0006 >> >> >2015-03-31 19:48:28,206 [myid:] - INFO >> >> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - >> Closed >> >> >socket connection for client /172.31.2.19:41801 which had sessionid >> >> >0x14c70bd0c3e0006 >> >> > >> >> > >> >> >172.31.2.19 is the Kafka broker. The job continued unphased; Samza >> >>didn¹t >> >> >log anything about this socket being closed or any kind of error. >>Not >> >>sure >> >> >if that¹s related. >> >> > >> >> > >> >> >Again, thanks a ton for reading and whatever help you can offer. >> >> > >> >> >Andrew Sannier >> >> >Software Engineer, Big Data >> >> >C: 480-284-1048 >> >> >www.helixeducation.com >> >> > >> >> > >> >> >> >> >> >>