According to BSPMaster log messages, a few tasks of all are finished with SUCCEEDED status during the iterations. If I remember correctly, child processes calls bspPeer.close() finally.
Then yes, others will be hanged at the step of comparing the size of znode and initial task size. I wonder what happens if some task no longer need to communicate with others? On Fri, Sep 23, 2011 at 11:59 PM, Thomas Jungblut <[email protected]> wrote: > Well, for SSSP example it might be correct. > But you faced the hanging problems in randbench, too. > > Moreover, we have to implement our own mechanisms for high availability if >> we have own sync master server. >> > > +1 > > 2011/9/23 Edward J. Yoon <[email protected]> > >> As I mentioned before, it's not a ZK problem. >> >> Moreover, we have to implement our own mechanisms for high availability if >> we have own sync master server. >> >> Sent from my iPad >> >> On Sep 23, 2011, at 11:01 PM, Thomas Jungblut < >> [email protected]> wrote: >> >> > I have made a github for that: >> > https://github.com/thomasjungblut/barriersync >> > >> > Check it out into your eclipse (the root directory failed for whatever >> > reason). >> > Start the server and then the clientemulator. >> > Works like a real charm. >> > >> > Please consider this as an alternative. We should not roll out a 4.0 >> release >> > with a not working barrier sync. >> > >> > 2011/9/23 Thomas Jungblut <[email protected]> >> > >> >> Won't much different. >> >>> >> >> >> >> Let's see. >> >> >> >> 2011/9/23 Edward J. Yoon <[email protected]> >> >> >> >>> What happens if some task no longer need to communicate with others? >> >>> >> >>> I didn't look at the code recently but I guess that the problem is >> >>> related with comparison of znode size and task size. >> >>> >> >>>> I am going to write a RPC barrier sync. Zookeeper sucks in this case. >> >>> >> >>> Won't much different. Let's focusing on NG integration and In/Output >> >>> system. >> >>> >> >>> On Fri, Sep 23, 2011 at 8:21 PM, Thomas Jungblut >> >>> <[email protected]> wrote: >> >>>> I am going to write a RPC barrier sync. Zookeeper sucks in this case. >> >>>> >> >>>> 2011/9/23 Edward J. Yoon <[email protected]> >> >>>> >> >>>>> P.S., Tested on 16 nodes using 10 tasks per node. >> >>>>> >> >>>>> On Fri, Sep 23, 2011 at 7:19 PM, Edward J. Yoon < >> [email protected] >> >>>> >> >>>>> wrote: >> >>>>>> Hi, >> >>>>>> >> >>>>>> Today I ran the sssp example with 4GB sample file. >> >>>>>> >> >>>>>> At 32th step, some tasks are finished and others hang forever. >> >>>>>> >> >>>>>> Could anyone figure out this problem? >> >>>>>> >> >>>>>> Plus, there're too many INFO-level logs. Let's reduce them. >> >>>>>> >> >>>>>> Thanks. >> >>>>>> >> >>>>>> -- >> >>>>>> Best Regards, Edward J. Yoon >> >>>>>> @eddieyoon >> >>>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Best Regards, Edward J. Yoon >> >>>>> @eddieyoon >> >>>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Thomas Jungblut >> >>>> Berlin >> >>>> >> >>>> mobile: 0170-3081070 >> >>>> >> >>>> business: [email protected] >> >>>> private: [email protected] >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Best Regards, Edward J. Yoon >> >>> @eddieyoon >> >>> >> >> >> >> >> >> >> >> -- >> >> Thomas Jungblut >> >> Berlin >> >> >> >> mobile: 0170-3081070 >> >> >> >> business: [email protected] >> >> private: [email protected] >> >> >> > >> > >> > >> > -- >> > Thomas Jungblut >> > Berlin >> > >> > mobile: 0170-3081070 >> > >> > business: [email protected] >> > private: [email protected] >> > > > > -- > Thomas Jungblut > Berlin > > mobile: 0170-3081070 > > business: [email protected] > private: [email protected] > -- Best Regards, Edward J. Yoon @eddieyoon
