> - 8 GB RAM I guess it looks like a typo Minho. :-) AFAIK, each node has 192GB memory.
+1 we need to increase the default maxClientCnxns since modern machines have enough RAM. On Tue, Jul 7, 2015 at 7:13 PM, 김민호 <minwise....@samsung.com> wrote: > Hi all, > > > > Recently, I set up Hama cluster using 2 machines. > > This specification is as follows: > > - 8 GB RAM > > - 12 TB HDD > > - (I don’t remember CPU spec.) > > > > In order to run hama job, I set up configuration, bsp.tasks.maximum=40 and > bsp.child.java.opts=-Xmx4096m, in hama-site.xml. (skip rests of settings.) > > So I performed examples which are pi Estimator and FastGraphGen but I got > below errors. > > > > attempt_201507071627_0001_000023_0: > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for > /bsp/job_201507071627_0001/peers/cluster-0:61029 > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZKSyncClient.isExists(ZKSyncClient.java:108) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:261) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.registerTask(ZooKeeperSyncC > lientImpl.java:279) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.register(ZooKeeperSyncClien > tImpl.java:261) > > attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl. > initializeSyncService(BSPPeerImpl.java:305) > > attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl. > <init>(BSPPeerImpl.java:185) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251) > > attempt_201507071627_0001_000023_0: 15/07/07 16:27:40 ERROR > sync.ZKSyncClient: Error creating zk path > /bsp/job_201507071627_0001/peers/cluster-0:61029 > > attempt_201507071627_0001_000023_0: > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /bsp > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZKSyncClient.createZnode(ZKSyncClient.java:135) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:281) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.registerTask(ZooKeeperSyncC > lientImpl.java:279) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.register(ZooKeeperSyncClien > tImpl.java:261) > > attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl. > initializeSyncService(BSPPeerImpl.java:305) > > attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl. > <init>(BSPPeerImpl.java:185) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251) > > attempt_201507071627_0001_000023_0: 15/07/07 16:27:42 ERROR > sync.ZKSyncClient: Error checking zk path /bsp/job_201507071627_0001/sync/-1 > > attempt_201507071627_0001_000023_0: > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /bsp/job_201507071627_0001/sync/-1 > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZKSyncClient.isExists(ZKSyncClient.java:108) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:261) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncC > lientImpl.java:100) > > attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl. > doFirstSync(BSPPeerImpl.java:312) > > attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl. > <init>(BSPPeerImpl.java:238) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251) > > attempt_201507071627_0001_000023_0: 15/07/07 16:27:44 ERROR > sync.ZKSyncClient: Error creating zk path /bsp/job_201507071627_0001/sync/-1 > > attempt_201507071627_0001_000023_0: > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /bsp > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) > > attempt_201507071627_0001_000023_0: at > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZKSyncClient.createZnode(ZKSyncClient.java:135) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:281) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncC > lientImpl.java:100) > > attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl. > doFirstSync(BSPPeerImpl.java:312) > > attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl. > <init>(BSPPeerImpl.java:238) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251) > > attempt_201507071627_0001_000023_0: 15/07/07 16:27:46 FATAL > bsp.GroomServer: SyncError from child > > attempt_201507071627_0001_000023_0: org.apache.hama.bsp.sync.SyncException > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncC > lientImpl.java:138) > > attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl. > doFirstSync(BSPPeerImpl.java:312) > > attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl. > <init>(BSPPeerImpl.java:238) > > attempt_201507071627_0001_000023_0: at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251) > > 15/07/07 16:27:48 INFO bsp.BSPJobClient: Job failed. > > > > This is a ZK error. Hama tasks try to get the /bsp node from zookeeper and > fails. > > This is just because hama.zookeeper.property.maxClientCnxns is 30 in hama- > default.xml. > > The problem has been encountered while the number of maximum tasks is > larger than it. > > To solve the problem, Hama has a setting to increase the number of > connectiosns to ZK. > > > > <property> > > <name>hama.zookeeper.property.maxClientCnxns</name> > > <value>100</value> > > </property> > > > > So we should update the default number of connections which is over 100 > because server’s performance has been more improved than before. > > If you agree my opinion, I will change the default value as 300. > > > > Best regards, > > Minho Kim > > > -- Best Regards, Edward J. Yoon