Hi Roger, " but it only spawns one container and still hangs after bootstrap" -- this probably is due to your local machine does not have enough resource for the second container. Because I checked your log file, each container is about 4GB.
"When I run it on our YARN cluster with a single container, it works correctly. When I tried it with 5 containers, it gets hung after consuming the bootstrap topic." -- Have you figure it out? I have a looked at your log and also the code. My suspect is that, there is a null enveloper somehow blocking the process. If you can paste the trace level log, it will be more helpful because many logs in chooser are trace level. Thanks, Fang, Yan yanfang...@gmail.com On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover <roger.hoo...@gmail.com> wrote: > I need some help. I have a job which bootstraps one stream and then is > supposed to read from two. When I run it on our YARN cluster with a single > container, it works correctly. When I tried it with 5 containers, it gets > hung after consuming the bootstrap topic. I ran it with the grid script on > my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one > container and still hangs after bootstrap. > > Debug logs are here: http://pastebin.com/af3KPvju > > I looked at JMX metrics and see: > - Task Metrics - no value for kafka offset of non-bootstrapped stream > - SystemConsumerMetrics > - choose null keeps incrementing > - ssps-needed-by-chooser 1 > - unprocessed-messages 62k > - Bootstrapping Chooser > - lagging partitions 4 > - laggin-batch-streams - 4 > - batch-resets - 0 > > Has anyone seen this or can offer ideas of how to better debug it? > > I'm using Samza 0.9.0 and YARN 2.4.0. > > Thanks! > > Roger >