In other words, all tasks should be entered into next step until whole job is completed successfully.
On Sat, Sep 24, 2011 at 12:37 AM, Edward J. Yoon <[email protected]> wrote: > According to BSPMaster log messages, a few tasks of all are finished > with SUCCEEDED status during the iterations. If I remember correctly, > child processes calls bspPeer.close() finally. > > Then yes, others will be hanged at the step of comparing the size of > znode and initial task size. > > I wonder what happens if some task no longer need to communicate with others? > > On Fri, Sep 23, 2011 at 11:59 PM, Thomas Jungblut > <[email protected]> wrote: >> Well, for SSSP example it might be correct. >> But you faced the hanging problems in randbench, too. >> >> Moreover, we have to implement our own mechanisms for high availability if >>> we have own sync master server. >>> >> >> +1 >> >> 2011/9/23 Edward J. Yoon <[email protected]> >> >>> As I mentioned before, it's not a ZK problem. >>> >>> Moreover, we have to implement our own mechanisms for high availability if >>> we have own sync master server. >>> >>> Sent from my iPad >>> >>> On Sep 23, 2011, at 11:01 PM, Thomas Jungblut < >>> [email protected]> wrote: >>> >>> > I have made a github for that: >>> > https://github.com/thomasjungblut/barriersync >>> > >>> > Check it out into your eclipse (the root directory failed for whatever >>> > reason). >>> > Start the server and then the clientemulator. >>> > Works like a real charm. >>> > >>> > Please consider this as an alternative. We should not roll out a 4.0 >>> release >>> > with a not working barrier sync. >>> > >>> > 2011/9/23 Thomas Jungblut <[email protected]> >>> > >>> >> Won't much different. >>> >>> >>> >> >>> >> Let's see. >>> >> >>> >> 2011/9/23 Edward J. Yoon <[email protected]> >>> >> >>> >>> What happens if some task no longer need to communicate with others? >>> >>> >>> >>> I didn't look at the code recently but I guess that the problem is >>> >>> related with comparison of znode size and task size. >>> >>> >>> >>>> I am going to write a RPC barrier sync. Zookeeper sucks in this case. >>> >>> >>> >>> Won't much different. Let's focusing on NG integration and In/Output >>> >>> system. >>> >>> >>> >>> On Fri, Sep 23, 2011 at 8:21 PM, Thomas Jungblut >>> >>> <[email protected]> wrote: >>> >>>> I am going to write a RPC barrier sync. Zookeeper sucks in this case. >>> >>>> >>> >>>> 2011/9/23 Edward J. Yoon <[email protected]> >>> >>>> >>> >>>>> P.S., Tested on 16 nodes using 10 tasks per node. >>> >>>>> >>> >>>>> On Fri, Sep 23, 2011 at 7:19 PM, Edward J. Yoon < >>> [email protected] >>> >>>> >>> >>>>> wrote: >>> >>>>>> Hi, >>> >>>>>> >>> >>>>>> Today I ran the sssp example with 4GB sample file. >>> >>>>>> >>> >>>>>> At 32th step, some tasks are finished and others hang forever. >>> >>>>>> >>> >>>>>> Could anyone figure out this problem? >>> >>>>>> >>> >>>>>> Plus, there're too many INFO-level logs. Let's reduce them. >>> >>>>>> >>> >>>>>> Thanks. >>> >>>>>> >>> >>>>>> -- >>> >>>>>> Best Regards, Edward J. Yoon >>> >>>>>> @eddieyoon >>> >>>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> -- >>> >>>>> Best Regards, Edward J. Yoon >>> >>>>> @eddieyoon >>> >>>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Thomas Jungblut >>> >>>> Berlin >>> >>>> >>> >>>> mobile: 0170-3081070 >>> >>>> >>> >>>> business: [email protected] >>> >>>> private: [email protected] >>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Best Regards, Edward J. Yoon >>> >>> @eddieyoon >>> >>> >>> >> >>> >> >>> >> >>> >> -- >>> >> Thomas Jungblut >>> >> Berlin >>> >> >>> >> mobile: 0170-3081070 >>> >> >>> >> business: [email protected] >>> >> private: [email protected] >>> >> >>> > >>> > >>> > >>> > -- >>> > Thomas Jungblut >>> > Berlin >>> > >>> > mobile: 0170-3081070 >>> > >>> > business: [email protected] >>> > private: [email protected] >>> >> >> >> >> -- >> Thomas Jungblut >> Berlin >> >> mobile: 0170-3081070 >> >> business: [email protected] >> private: [email protected] >> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon > -- Best Regards, Edward J. Yoon @eddieyoon
