> I don't know what's going on but it works! Thread.sleep(100) helps ! Then, it's a mutex-related problem. We'll fix it soon. :)
Thanks! On Wed, Feb 16, 2011 at 11:47 PM, Paweł Brach <[email protected]> wrote: > I don't know what's going on but it works! Thread.sleep(100) helps ! > > Thanks, > Pawel > > 2011/2/16 Edward J. Yoon <[email protected]> > >> Looks like problem of sync. Can you try again it after add >> Thread.sleep(100); line? >> >> Sent from my iPhone >> >> On 2011. 2. 16., at 오후 3:24, Paweł Brach <[email protected]> wrote: >> >> > Yes, I have of course. My cluster has been configured and both examples >> > PiEstimator and SerializePrinting work (there is communication between 3 >> > nodes). I've modified your example - PiEstimator (put everything in the >> > loop) and it works for few iterations (there is communication) and after >> > that connection is lost. After that connection is re-established but some >> > messages are missing. It looks like that Hama framework is very unstable >> > when it's loaded and many messages are sending between nodes. >> > On the same cluster I've configured Apache Hadoop and it's very stable. >> > If you have own cluster configured, could you run my example on it ? Have >> > you ever run something more complicated than PiEstimator and >> > SerializePrinting on it ? >> > >> > Cheers, >> > Pawel >> > >> > 2011/2/16 Chia-Hung Lin <[email protected]> >> > >> >> Have you configured zookeeper in hama-site.xml? Hama makes use of >> >> zookeeper to do node communication IIRC. >> >> >> >> Opening socket connection to server cl5/127.0.1.1:2181 >> >> >> >> indicates that seems only localhost is up. If this is the case, you >> >> can change hama.zookeeper.quorum property pointing with value set to >> >> e.g. >> >> >> >> <property> >> >> <name>hama.zookeeper.quorum</name> >> >> <value>node1,node2,node3,node4,node5</value> >> >> </property> >> >> >> >> Hope it helps >> >> >> >> 2011/2/15 Paweł Brach <[email protected]>: >> >>> Hello, >> >>> >> >>> During last few days I've tested Hama solutions and today I found some >> >>> strange error in Hama framework. If you run a simple job with more than >> >> few >> >>> supersteps the following error occures: >> >>> >> >>> 2011-02-15 15:13:55,934 ERROR org.apache.hama.bsp.BSPPeer: >> >>> 2011-02-15 15:13:56,525 INFO org.apache.zookeeper.ClientCnxn: Opening >> >> socket >> >>> connection to server cl5/127.0.1.1:2181 >> >>> 2011-02-15 15:13:56,526 WARN org.apache.zookeeper.ClientCnxn: Session >> 0x0 >> >>> for server null, unexpected error, closing socket connection and >> >> attempting >> >>> reconnect >> >>> java.net.ConnectException: Connection refused >> >>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> >>> at >> >>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) >> >>> at >> >>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) >> >>> 2011-02-15 15:13:56,626 ERROR org.apache.hama.bsp.BSPPeer: >> >>> org.apache.zookeeper.KeeperException$ConnectionLossException: >> >>> KeeperErrorCode = ConnectionLoss for /bsp >> >>> >> >>> You can reproduce that by running PiEstimator (the newest source code >> >> from >> >>> svn) with small changes - put whole body of the bsp() method in the for >> >>> loop. So add in the beginning following line: >> >>> >> >>> for (int j = 0; j < 100; j++) { >> >>> // oryginal bsp() code >> >>> } >> >>> >> >>> When I'm trying to run it, the framowork hangs and mentioned before >> error >> >>> occures. >> >>> >> >>> Your help will be appreciated. >> >>> >> >>> Cheers, >> >>> >> >>> -- >> >>> Pawel Brach >> >>> >> >> >> >> >> >> >> >> -- >> >> ChiaHung Lin @ nuk, tw. >> >> >> > >> > >> > >> > -- >> > Paweł Brach >> > > > > -- > Paweł Brach > -- Best Regards, Edward J. Yoon http://blog.udanax.org http://twitter.com/eddieyoon
