No, actually not. 2011/9/24 Edward J. Yoon <[email protected]>
> > There is a while(updated) loop. > > Updated is just getting false when globally no updates were made. > > Same logic in Pagerank. > > Is this mean that some process can be finished earlier than others? > > On Sat, Sep 24, 2011 at 12:50 AM, Thomas Jungblut > <[email protected]> wrote: > > If it is still about SSSP: > > Well, I took that into account. That is the reason why there is a master > > task. > > There is a while(updated) loop. > > Updated is just getting false when globally no updates were made. > > Same logic in Pagerank. > > This is totally failsafe :p > > > > 2011/9/23 Edward J. Yoon <[email protected]> > > > >> In other words, all tasks should be entered into next step until whole > >> job is completed successfully. > >> > >> On Sat, Sep 24, 2011 at 12:37 AM, Edward J. Yoon <[email protected] > > > >> wrote: > >> > According to BSPMaster log messages, a few tasks of all are finished > >> > with SUCCEEDED status during the iterations. If I remember correctly, > >> > child processes calls bspPeer.close() finally. > >> > > >> > Then yes, others will be hanged at the step of comparing the size of > >> > znode and initial task size. > >> > > >> > I wonder what happens if some task no longer need to communicate with > >> others? > >> > > >> > On Fri, Sep 23, 2011 at 11:59 PM, Thomas Jungblut > >> > <[email protected]> wrote: > >> >> Well, for SSSP example it might be correct. > >> >> But you faced the hanging problems in randbench, too. > >> >> > >> >> Moreover, we have to implement our own mechanisms for high > availability > >> if > >> >>> we have own sync master server. > >> >>> > >> >> > >> >> +1 > >> >> > >> >> 2011/9/23 Edward J. Yoon <[email protected]> > >> >> > >> >>> As I mentioned before, it's not a ZK problem. > >> >>> > >> >>> Moreover, we have to implement our own mechanisms for high > availability > >> if > >> >>> we have own sync master server. > >> >>> > >> >>> Sent from my iPad > >> >>> > >> >>> On Sep 23, 2011, at 11:01 PM, Thomas Jungblut < > >> >>> [email protected]> wrote: > >> >>> > >> >>> > I have made a github for that: > >> >>> > https://github.com/thomasjungblut/barriersync > >> >>> > > >> >>> > Check it out into your eclipse (the root directory failed for > >> whatever > >> >>> > reason). > >> >>> > Start the server and then the clientemulator. > >> >>> > Works like a real charm. > >> >>> > > >> >>> > Please consider this as an alternative. We should not roll out a > 4.0 > >> >>> release > >> >>> > with a not working barrier sync. > >> >>> > > >> >>> > 2011/9/23 Thomas Jungblut <[email protected]> > >> >>> > > >> >>> >> Won't much different. > >> >>> >>> > >> >>> >> > >> >>> >> Let's see. > >> >>> >> > >> >>> >> 2011/9/23 Edward J. Yoon <[email protected]> > >> >>> >> > >> >>> >>> What happens if some task no longer need to communicate with > >> others? > >> >>> >>> > >> >>> >>> I didn't look at the code recently but I guess that the problem > is > >> >>> >>> related with comparison of znode size and task size. > >> >>> >>> > >> >>> >>>> I am going to write a RPC barrier sync. Zookeeper sucks in this > >> case. > >> >>> >>> > >> >>> >>> Won't much different. Let's focusing on NG integration and > >> In/Output > >> >>> >>> system. > >> >>> >>> > >> >>> >>> On Fri, Sep 23, 2011 at 8:21 PM, Thomas Jungblut > >> >>> >>> <[email protected]> wrote: > >> >>> >>>> I am going to write a RPC barrier sync. Zookeeper sucks in this > >> case. > >> >>> >>>> > >> >>> >>>> 2011/9/23 Edward J. Yoon <[email protected]> > >> >>> >>>> > >> >>> >>>>> P.S., Tested on 16 nodes using 10 tasks per node. > >> >>> >>>>> > >> >>> >>>>> On Fri, Sep 23, 2011 at 7:19 PM, Edward J. Yoon < > >> >>> [email protected] > >> >>> >>>> > >> >>> >>>>> wrote: > >> >>> >>>>>> Hi, > >> >>> >>>>>> > >> >>> >>>>>> Today I ran the sssp example with 4GB sample file. > >> >>> >>>>>> > >> >>> >>>>>> At 32th step, some tasks are finished and others hang > forever. > >> >>> >>>>>> > >> >>> >>>>>> Could anyone figure out this problem? > >> >>> >>>>>> > >> >>> >>>>>> Plus, there're too many INFO-level logs. Let's reduce them. > >> >>> >>>>>> > >> >>> >>>>>> Thanks. > >> >>> >>>>>> > >> >>> >>>>>> -- > >> >>> >>>>>> Best Regards, Edward J. Yoon > >> >>> >>>>>> @eddieyoon > >> >>> >>>>>> > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>>> -- > >> >>> >>>>> Best Regards, Edward J. Yoon > >> >>> >>>>> @eddieyoon > >> >>> >>>>> > >> >>> >>>> > >> >>> >>>> > >> >>> >>>> > >> >>> >>>> -- > >> >>> >>>> Thomas Jungblut > >> >>> >>>> Berlin > >> >>> >>>> > >> >>> >>>> mobile: 0170-3081070 > >> >>> >>>> > >> >>> >>>> business: [email protected] > >> >>> >>>> private: [email protected] > >> >>> >>>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> -- > >> >>> >>> Best Regards, Edward J. Yoon > >> >>> >>> @eddieyoon > >> >>> >>> > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> -- > >> >>> >> Thomas Jungblut > >> >>> >> Berlin > >> >>> >> > >> >>> >> mobile: 0170-3081070 > >> >>> >> > >> >>> >> business: [email protected] > >> >>> >> private: [email protected] > >> >>> >> > >> >>> > > >> >>> > > >> >>> > > >> >>> > -- > >> >>> > Thomas Jungblut > >> >>> > Berlin > >> >>> > > >> >>> > mobile: 0170-3081070 > >> >>> > > >> >>> > business: [email protected] > >> >>> > private: [email protected] > >> >>> > >> >> > >> >> > >> >> > >> >> -- > >> >> Thomas Jungblut > >> >> Berlin > >> >> > >> >> mobile: 0170-3081070 > >> >> > >> >> business: [email protected] > >> >> private: [email protected] > >> >> > >> > > >> > > >> > > >> > -- > >> > Best Regards, Edward J. Yoon > >> > @eddieyoon > >> > > >> > >> > >> > >> -- > >> Best Regards, Edward J. Yoon > >> @eddieyoon > >> > > > > > > > > -- > > Thomas Jungblut > > Berlin > > > > mobile: 0170-3081070 > > > > business: [email protected] > > private: [email protected] > > > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon > -- Thomas Jungblut Berlin mobile: 0170-3081070 business: [email protected] private: [email protected]
