From the code I observed, it seems that znodes created are consisted of peer names only (in the form of `host:port'). Therefore, processes at different superstep share the flat namespace. During iteration of each supersteps, the newer superstep process can not be distinguished from the older one, resulting in process hanging. Adding superstep value to created znode and filtering out znode of next superstep might solve the problem.
But I haven't tested the code, so I may be wrong because of misunderstanding. -----Original message----- From:Edward J. Yoon <[email protected]> To:[email protected] Date:Tue, 21 Jun 2011 17:20:21 +0900 Subject:Re: Lock and Barrier Synchronization Especially, this can be problematic when locking a large number of BSPPeers. On Tue, Jun 21, 2011 at 5:13 PM, Edward J. Yoon <[email protected]> wrote: > Hi all, > > Recently I'm looking at HAMA-387. > > There's some problem related with lock and barrier synchronization. > The problem is as soon as last one of lock files deleted (before > completely escape from while loop at leaveBarrier method), others > begin to create their lock file. So, sometimes, it causes hang. > > My temporary solution is 'Thread.sleep(200);'. Good but not perfect. > If zk.getChildren() response is slower than 200 milliseconds, process > will be hanged. > > Is there any other idea? > > Thanks. > -- > Best Regards, Edward J. Yoon > @eddieyoon > -- Best Regards, Edward J. Yoon @eddieyoon -- ChiaHung Lin Department of Information Management National University of Kaohsiung Taiwan
