Re: Hang problem

Thomas Jungblut Fri, 23 Sep 2011 22:49:04 -0700

No, actually not.

2011/9/24 Edward J. Yoon <[email protected]>


> > There is a while(updated) loop.
> > Updated is just getting false when globally no updates were made.
> > Same logic in Pagerank.
>
> Is this mean that some process can be finished earlier than others?
>
> On Sat, Sep 24, 2011 at 12:50 AM, Thomas Jungblut
> <[email protected]> wrote:
> > If it is still about SSSP:
> > Well, I took that into account. That is the reason why there is a master
> > task.
> > There is a while(updated) loop.
> > Updated is just getting false when globally no updates were made.
> > Same logic in Pagerank.
> > This is totally failsafe :p
> >
> > 2011/9/23 Edward J. Yoon <[email protected]>
> >
> >> In other words, all tasks should be entered into next step until whole
> >> job is completed successfully.
> >>
> >> On Sat, Sep 24, 2011 at 12:37 AM, Edward J. Yoon <[email protected]
> >
> >> wrote:
> >> > According to BSPMaster log messages, a few tasks of all are finished
> >> > with SUCCEEDED status during the iterations. If I remember correctly,
> >> > child processes calls bspPeer.close() finally.
> >> >
> >> > Then yes, others will be hanged at the step of comparing the size of
> >> > znode and initial task size.
> >> >
> >> > I wonder what happens if some task no longer need to communicate with
> >> others?
> >> >
> >> > On Fri, Sep 23, 2011 at 11:59 PM, Thomas Jungblut
> >> > <[email protected]> wrote:
> >> >> Well, for SSSP example it might be correct.
> >> >> But you faced the hanging problems in randbench, too.
> >> >>
> >> >> Moreover, we have to implement our own mechanisms for high
> availability
> >> if
> >> >>> we have own sync master server.
> >> >>>
> >> >>
> >> >> +1
> >> >>
> >> >> 2011/9/23 Edward J. Yoon <[email protected]>
> >> >>
> >> >>> As I mentioned before, it's not a ZK problem.
> >> >>>
> >> >>> Moreover, we have to implement our own mechanisms for high
> availability
> >> if
> >> >>> we have own sync master server.
> >> >>>
> >> >>> Sent from my iPad
> >> >>>
> >> >>> On Sep 23, 2011, at 11:01 PM, Thomas Jungblut <
> >> >>> [email protected]> wrote:
> >> >>>
> >> >>> > I have made a github for that:
> >> >>> > https://github.com/thomasjungblut/barriersync
> >> >>> >
> >> >>> > Check it out into your eclipse (the root directory failed for
> >> whatever
> >> >>> > reason).
> >> >>> > Start the server and then the clientemulator.
> >> >>> > Works like a real charm.
> >> >>> >
> >> >>> > Please consider this as an alternative. We should not roll out a
> 4.0
> >> >>> release
> >> >>> > with a not working barrier sync.
> >> >>> >
> >> >>> > 2011/9/23 Thomas Jungblut <[email protected]>
> >> >>> >
> >> >>> >> Won't much different.
> >> >>> >>>
> >> >>> >>
> >> >>> >> Let's see.
> >> >>> >>
> >> >>> >> 2011/9/23 Edward J. Yoon <[email protected]>
> >> >>> >>
> >> >>> >>> What happens if some task no longer need to communicate with
> >> others?
> >> >>> >>>
> >> >>> >>> I didn't look at the code recently but I guess that the problem
> is
> >> >>> >>> related with comparison of znode size and task size.
> >> >>> >>>
> >> >>> >>>> I am going to write a RPC barrier sync. Zookeeper sucks in this
> >> case.
> >> >>> >>>
> >> >>> >>> Won't much different. Let's focusing on NG integration and
> >> In/Output
> >> >>> >>> system.
> >> >>> >>>
> >> >>> >>> On Fri, Sep 23, 2011 at 8:21 PM, Thomas Jungblut
> >> >>> >>> <[email protected]> wrote:
> >> >>> >>>> I am going to write a RPC barrier sync. Zookeeper sucks in this
> >> case.
> >> >>> >>>>
> >> >>> >>>> 2011/9/23 Edward J. Yoon <[email protected]>
> >> >>> >>>>
> >> >>> >>>>> P.S., Tested on 16 nodes using 10 tasks per node.
> >> >>> >>>>>
> >> >>> >>>>> On Fri, Sep 23, 2011 at 7:19 PM, Edward J. Yoon <
> >> >>> [email protected]
> >> >>> >>>>
> >> >>> >>>>> wrote:
> >> >>> >>>>>> Hi,
> >> >>> >>>>>>
> >> >>> >>>>>> Today I ran the sssp example with 4GB sample file.
> >> >>> >>>>>>
> >> >>> >>>>>> At 32th step, some tasks are finished and others hang
> forever.
> >> >>> >>>>>>
> >> >>> >>>>>> Could anyone figure out this problem?
> >> >>> >>>>>>
> >> >>> >>>>>> Plus, there're too many INFO-level logs. Let's reduce them.
> >> >>> >>>>>>
> >> >>> >>>>>> Thanks.
> >> >>> >>>>>>
> >> >>> >>>>>> --
> >> >>> >>>>>> Best Regards, Edward J. Yoon
> >> >>> >>>>>> @eddieyoon
> >> >>> >>>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>> --
> >> >>> >>>>> Best Regards, Edward J. Yoon
> >> >>> >>>>> @eddieyoon
> >> >>> >>>>>
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>> --
> >> >>> >>>> Thomas Jungblut
> >> >>> >>>> Berlin
> >> >>> >>>>
> >> >>> >>>> mobile: 0170-3081070
> >> >>> >>>>
> >> >>> >>>> business: [email protected]
> >> >>> >>>> private: [email protected]
> >> >>> >>>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> --
> >> >>> >>> Best Regards, Edward J. Yoon
> >> >>> >>> @eddieyoon
> >> >>> >>>
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> --
> >> >>> >> Thomas Jungblut
> >> >>> >> Berlin
> >> >>> >>
> >> >>> >> mobile: 0170-3081070
> >> >>> >>
> >> >>> >> business: [email protected]
> >> >>> >> private: [email protected]
> >> >>> >>
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > --
> >> >>> > Thomas Jungblut
> >> >>> > Berlin
> >> >>> >
> >> >>> > mobile: 0170-3081070
> >> >>> >
> >> >>> > business: [email protected]
> >> >>> > private: [email protected]
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Thomas Jungblut
> >> >> Berlin
> >> >>
> >> >> mobile: 0170-3081070
> >> >>
> >> >> business: [email protected]
> >> >> private: [email protected]
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Best Regards, Edward J. Yoon
> >> > @eddieyoon
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin
> >
> > mobile: 0170-3081070
> >
> > business: [email protected]
> > private: [email protected]
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Thomas Jungblut
Berlin

mobile: 0170-3081070

business: [email protected]
private: [email protected]

Re: Hang problem

Reply via email to