---------- Forwarded message ---------- From: Edward J. Yoon <[email protected]> Date: Wed, Feb 16, 2011 at 11:25 PM Subject: Re: connecton loss exception To: "[email protected]" <[email protected]>
I decided to add a "random communication benchmark" tool. In this week (or next week), I'll share with you my benchmarking experience. I have 20 (160 cores) servers. Thanks. 2011/2/16 Edward J. Yoon <[email protected]>: > Looks like problem of sync. Can you try again it after add Thread.sleep(100); > line? > > Sent from my iPhone > > On 2011. 2. 16., at 오후 3:24, Paweł Brach <[email protected]> wrote: > >> Yes, I have of course. My cluster has been configured and both examples >> PiEstimator and SerializePrinting work (there is communication between 3 >> nodes). I've modified your example - PiEstimator (put everything in the >> loop) and it works for few iterations (there is communication) and after >> that connection is lost. After that connection is re-established but some >> messages are missing. It looks like that Hama framework is very unstable >> when it's loaded and many messages are sending between nodes. >> On the same cluster I've configured Apache Hadoop and it's very stable. >> If you have own cluster configured, could you run my example on it ? Have >> you ever run something more complicated than PiEstimator and >> SerializePrinting on it ? >> >> Cheers, >> Pawel >> >> 2011/2/16 Chia-Hung Lin <[email protected]> >> >>> Have you configured zookeeper in hama-site.xml? Hama makes use of >>> zookeeper to do node communication IIRC. >>> >>> Opening socket connection to server cl5/127.0.1.1:2181 >>> >>> indicates that seems only localhost is up. If this is the case, you >>> can change hama.zookeeper.quorum property pointing with value set to >>> e.g. >>> >>> <property> >>> <name>hama.zookeeper.quorum</name> >>> <value>node1,node2,node3,node4,node5</value> >>> </property> >>> >>> Hope it helps >>> >>> 2011/2/15 Paweł Brach <[email protected]>: >>>> Hello, >>>> >>>> During last few days I've tested Hama solutions and today I found some >>>> strange error in Hama framework. If you run a simple job with more than >>> few >>>> supersteps the following error occures: >>>> >>>> 2011-02-15 15:13:55,934 ERROR org.apache.hama.bsp.BSPPeer: >>>> 2011-02-15 15:13:56,525 INFO org.apache.zookeeper.ClientCnxn: Opening >>> socket >>>> connection to server cl5/127.0.1.1:2181 >>>> 2011-02-15 15:13:56,526 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 >>>> for server null, unexpected error, closing socket connection and >>> attempting >>>> reconnect >>>> java.net.ConnectException: Connection refused >>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>> at >>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) >>>> at >>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) >>>> 2011-02-15 15:13:56,626 ERROR org.apache.hama.bsp.BSPPeer: >>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>>> KeeperErrorCode = ConnectionLoss for /bsp >>>> >>>> You can reproduce that by running PiEstimator (the newest source code >>> from >>>> svn) with small changes - put whole body of the bsp() method in the for >>>> loop. So add in the beginning following line: >>>> >>>> for (int j = 0; j < 100; j++) { >>>> // oryginal bsp() code >>>> } >>>> >>>> When I'm trying to run it, the framowork hangs and mentioned before error >>>> occures. >>>> >>>> Your help will be appreciated. >>>> >>>> Cheers, >>>> >>>> -- >>>> Pawel Brach >>>> >>> >>> >>> >>> -- >>> ChiaHung Lin @ nuk, tw. >>> >> >> >> >> -- >> Paweł Brach > -- Best Regards, Edward J. Yoon http://blog.udanax.org http://twitter.com/eddieyoon -- Best Regards, Edward J. Yoon http://blog.udanax.org http://twitter.com/eddieyoon
