Maybe I have made a mistake. But I'm pasting you the logs. In the groom's log, I've filtered the noise of benchmarks task outputs. Hope it can help you. What additional information do you need? groom: http://justpaste.it/i22 zookeeper: http://justpaste.it/i23 master shows nothing, just the normal outputs.
thanks :) 2011/9/22 ChiaHung Lin <[email protected]> > Is there any "Ignore because znode may be deleted." sentence just above the > NoNodeException? This exception is thrown as warning which should not stop > the computation. > > Also, I test with pseudo-distributed mode as below > > for((i=0;i<20;i++)) ; do hama jar > hama-examples-0.4.0-incubating-SNAPSHOT.jar pi; done > > It works ok. > http://pastebin.com/CxGSfzHN > > And the log has exception which doesn't cause computation to hang > > http://pastebin.com/5HVwx6A1 > > attempt_201109221848_0020_000000_0 11/09/22 18:57:37 WARN bsp.BSPPeer: > Ignore because znode may be deleted. > 2011-09-22 18:57:37,331 INFO org.apache.hama.bsp.TaskRunner: > attempt_201109221848_0020_000000_0 > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /bsp/job_201109221848_0020/0/ready > > Can we have the full log post? And how it is executed, env, etc. Maybe the > problem stems from somewhere else. > > -----Original message----- > From:Thomas Jungblut <[email protected]> > To:[email protected],[email protected] > Date:Thu, 22 Sep 2011 10:43:13 +0200 > Subject:Re: Awesome bench results after removing Thread.sleep in sync() > method. > > I think when just changing the log level, log4j will take care of the > if(isEnabled) stuff, so we don't need to fragment our code. > Yes the current rev in trunk contains this snippet. I give you the rest of > the exception: > > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > > NoNode for /bsp/job_201109220959_0001/224/ready > > at > > org.apache.zookeeper.KeeperException.create(KeeperException.java:102) > > at > > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728) > > at org.apache.hama.bsp.BSPPeer$1.process(BSPPeer.java:396) > > at > > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488) > > > > Here is the part of the log of our zookeeper deamon: > > > 2011-09-22 09:59:59,435 INFO > > org.apache.zookeeper.server.PrepRequestProcessor: Got user-level > > KeeperException when processing sessionid:0x1329025208e0003 type:delete > > cxid:0xc01 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error > > Path:/bsp/job_201109220959_0001/222/ready Error:KeeperErrorCode = NoNode > for > > /bsp/job_201109220959_0001/222/ready > > 2011-09-22 09:59:59,499 INFO > > org.apache.zookeeper.server.PrepRequestProcessor: Got user-level > > KeeperException when processing sessionid:0x1329025208e0003 type:create > > cxid:0xc0e zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error > > Path:/bsp/job_201109220959_0001/223/ready Error:KeeperErrorCode = > NodeExists > > for /bsp/job_201109220959_0001/223/ready > > 2011-09-22 09:59:59,627 INFO > > org.apache.zookeeper.server.PrepRequestProcessor: Got user-level > > KeeperException when processing sessionid:0x1329025208e0004 type:delete > > cxid:0xc22 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error > > Path:/bsp/job_201109220959_0001/224/ready Error:KeeperErrorCode = NoNode > for > > /bsp/job_201109220959_0001/224/ready > > > > 2011/9/22 ChiaHung Lin <[email protected]> > > > We might need to change log method by adding > > > > if(LOG.isInfoEnabled()){ > > ... > > } > > > > at least it can prevent string concatenation for performance > optimization. > > (debug can be changed to if(LOG.isDebugEnabled()){} for performance > > optimization, too.) > > > > In addition, can you help check if enterBarrier() contains the following > > code snippet? > > > > ... > > zk.exists(pathToSuperstepZnode+"/ready", new Watcher() { > > @Override > > public void process(WatchedEvent event) { > > // check if /ready znode exists, then delete it. > > ... > > } catch(KeeperException.NoNodeException nne) { > > LOG.warn("Ignore because znode may be deleted.", nne); > > }... > > } > > }); > > zk.create(getNodeName(), null, Ids.OPEN_ACL_UNSAFE, > > CreateMode.EPHEMERAL); > > ... > > > > It looks like bsp peer is trying to remove /ready znode which may have > > already been removed by other bsp peer. Or stack trace in log would be > > helpful. > > > > > > -----Original message----- > > From:Thomas Jungblut <[email protected]> > > To:[email protected] > > Date:Thu, 22 Sep 2011 10:05:52 +0200 > > Subject:Re: Awesome bench results after removing Thread.sleep in sync() > > method. > > > > You're going to laugh, but we spend 80% of the time, logging the > messages. > > Let's change the log level to debug or remove the logging in the bench > > example. > > > > Sadly I still receive > > > > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > > > NoNode for /bsp/job_201109220959_0001/224/ready > > > > > > > and it hangs forever. Current version is after you committed ChiaHung's > > patch. > > I'm in pseudo-distributed mode with 3 tasks. > > > > Are you going to bench this without the logging? That would be > interesting > > though ;D > > > > 2011/9/22 Thomas Jungblut <[email protected]> > > > > > That is great. I think we can push this under 200s. > > > I attach a profiler and send you a list of hotspots. > > > > > > lg. > > > > > > 2011/9/22 Edward J. Yoon <[email protected]> > > > > > > By ChiaHung's HAMA-387.patch, hang problem is fixed. > > >> > > >> And also, on same environment (1 rack, 256 cores), a bench example > > >> result is dramatically improved. (184.076 seconds from 307.129 > > >> seconds) > > >> > > >> ---- > > >> # core/bin/hama jar > > >> examples/target/hama-examples-0.4.0-incubating-SNAPSHOT.jar bench 16 > > >> 1000 512 > > >> .. > > >> 11/09/22 10:27:32 INFO bsp.BSPJobClient: Current supersteps number: > 504 > > >> 11/09/22 10:27:35 INFO bsp.BSPJobClient: Current supersteps number: > 508 > > >> 11/09/22 10:27:38 INFO bsp.BSPJobClient: Current supersteps number: > 512 > > >> 11/09/22 10:27:38 INFO bsp.BSPJobClient: The total number of > supersteps: > > >> 512 > > >> Job Finished in 184.076 seconds > > >> > > >> Hama 0.4 (r.1163903) was: > > >> > > >> 16 bytes | 1000 | 512 | 307.129 seconds > > >> > > >> -- > > >> Best Regards, Edward J. Yoon > > >> @eddieyoon > > >> > > > > > > > > > > > > -- > > > Thomas Jungblut > > > Berlin > > > > > > mobile: 0170-3081070 > > > > > > business: [email protected] > > > private: [email protected] > > > > > > > > > > > -- > > Thomas Jungblut > > Berlin > > > > mobile: 0170-3081070 > > > > business: [email protected] > > private: [email protected] > > > > > > -- > > ChiaHung Lin > > Department of Information Management > > National University of Kaohsiung > > Taiwan > > > > > > -- > Thomas Jungblut > Berlin > > mobile: 0170-3081070 > > business: [email protected] > private: [email protected] > > > -- > ChiaHung Lin > Department of Information Management > National University of Kaohsiung > Taiwan > -- Thomas Jungblut Berlin mobile: 0170-3081070 business: [email protected] private: [email protected]
