Hi Andrew, The first thought I have is that, the container keeps failing due to some exceptions. Could you check all the AM and containers run successfully? You can see the logs in $HADOOP_Home/logs/userlogs
Thanks, Fang, Yan [email protected] On Mon, Mar 30, 2015 at 4:05 PM, Andrew Sannier <[email protected] > wrote: > Hi - > > Thanks in advance for your help. > > I have been following this guide > http://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html > trying to prove that my samza cluster runs. I get as far as having a > Running YARN task, as the tutorial specifies, but this task doesn’t > actually do anything. No log that I’ve found (I’ve looked at application > master, yarn resource manager, node manager logs, as well as the stderror > and stdout userlogs on the resource manager nodes) shows any kind of error > or warning; they simply stop growing after the initial setup with > > > 2015-03-30 20:27:48 SamzaAppMasterTaskManager [INFO] Requesting 1 > containers > > 2015-03-30 20:27:48 SamzaAppMasterTaskManager [INFO] Requesting 1 > container(s) with 850mb of memory > > The app doesn't die or anything, but I never see any data flowing through > kafka from the wikipedia feed. > > On the Kafka side, the logs show something very similar to the logs here: > https://issues.apache.org/jira/browse/KAFKA-1393, suggesting that Samza > is creating and closing many connections in sequence (though I have no idea > why). Excerpt: > > > [2015-03-30 22:45:59,561] INFO Closing socket connection to /172.31.11.241. > (kafka.network.Processor) > > [2015-03-30 22:45:59,592] INFO Closing socket connection to /172.31.11.241. > (kafka.network.Processor) > > [2015-03-30 22:49:29,927] INFO Closing socket connection to /172.31.11.206. > (kafka.network.Processor) > > *.241 is the single ResourceManager node in my YARN cluster and *.206 is > the single Kafka broker itself, the box on which I viewed this log. Then I > see this error: > > > [2015-03-30 22:49:49,261] ERROR Closing socket for /172.31.11.206 because > of error (kafka.network.Processor) > > java.io.IOException: Connection reset by peer > > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > > at kafka.utils.Utils$.read(Utils.scala:375) > > at > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) > > at kafka.network.Processor.read(SocketServer.scala:347) > > at kafka.network.Processor.run(SocketServer.scala:245) > > at java.lang.Thread.run(Thread.java:745) > > As far as I can tell, this suggests that Samza reset the connection. The > only other weirdness in the logs is in the ApplicationManager’s garbage > collection log, which looks like this: > > > 2015-03-30T22:46:00.670+0000: 4.674: [GC (Allocation Failure) > 16244K->7692K(31808K), 0.0029805 secs] > > 2015-03-30T22:46:00.721+0000: 4.725: [GC (Allocation Failure) > 16516K->8128K(31808K), 0.0025949 secs] > > 2015-03-30T22:46:00.818+0000: 4.822: [GC (Allocation Failure) > 16960K->7890K(31808K), 0.0021872 secs] > > 2015-03-30T22:46:01.042+0000: 5.046: [GC (Allocation Failure) > 16722K->8642K(31808K), 0.0032969 secs] > > 2015-03-30T22:51:56.920+0000: 360.924: [GC (Allocation Failure) > 17474K->8476K(31808K), 0.0029685 secs] > > Is it possible that the garbage collection cycles are causing Samza to > rapidly recreate connections to Zookeeper/Kafka? Zookeeper’s logs also > suggest that consumers are being created and deleted rapidly: > > > 2015-03-30 22:53:54,371 [myid:] - INFO [ProcessThread(sid:0 > cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException > when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x2 zxid:0xad > txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/ids > Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/ids > > 2015-03-30 22:53:54,374 [myid:] - INFO [ProcessThread(sid:0 > cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException > when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x3 zxid:0xae > txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758 > Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758 > > 2015-03-30 22:53:54,678 [myid:] - INFO [ProcessThread(sid:0 > cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException > when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x17 zxid:0xb2 > txntype:-1 reqpath:n/a Error > Path:/consumers/console-consumer-43758/owners/test Error:KeeperErrorCode = > NoNode for /consumers/console-consumer-43758/owners/test > > 2015-03-30 22:53:54,681 [myid:] - INFO [ProcessThread(sid:0 > cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException > when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x18 zxid:0xb3 > txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/owners > Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/owners > > 2015-03-30 22:53:57,223 [myid:] - INFO [ProcessThread(sid:0 > cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException > when processing sessionid:0x14c6ccff26d0013 type:setData cxid:0x23 > zxid:0xb8 txntype:-1 reqpath:n/a Error > Path:/consumers/console-consumer-43758/offsets/test/0 Error:KeeperErrorCode > = NoNode for /consumers/console-consumer-43758/offsets/test/0 > > 2015-03-30 22:53:57,229 [myid:] - INFO [ProcessThread(sid:0 > cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException > when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x24 zxid:0xb9 > txntype:-1 reqpath:n/a Error Path:/consumers/console-consumer-43758/offsets > Error:KeeperErrorCode = NoNode for /consumers/console-consumer-43758/offsets > > 2015-03-30 22:53:57,255 [myid:] - INFO [ProcessThread(sid:0 > cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException > when processing sessionid:0x14c6ccff26d0013 type:setData cxid:0x28 > zxid:0xbd txntype:-1 reqpath:n/a Error > Path:/consumers/console-consumer-43758/offsets/test/1 Error:KeeperErrorCode > = NoNode for /consumers/console-consumer-43758/offsets/test/1 > > 2015-03-30 22:53:57,257 [myid:] - INFO [ProcessThread(sid:0 > cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException > when processing sessionid:0x14c6ccff26d0013 type:create cxid:0x29 zxid:0xbe > txntype:-1 reqpath:n/a Error > Path:/consumers/console-consumer-43758/offsets/test Error:KeeperErrorCode = > NodeExists for /consumers/console-consumer-43758/offsets/test > > Any help will be greatly appreciated – I’m really stuck on this one. > > Thanks, > [Helix Education]<http://www.helixeducation.com/> > Andrew Sannier > Software Engineer, Big Data > > C: 480-284-1048 > > www.helixeducation.com<http://www.helixeducation.com/> > Blog<http://www.helixeducation.com/blog/> | Twitter< > https://twitter.com/HelixEducation> | Facebook< > https://www.facebook.com/HelixEducation> | LinkedIn< > http://www.linkedin.com/company/3609946> >
