According to BSPMaster log messages, a few tasks of all are finished
with SUCCEEDED status during the iterations. If I remember correctly,
child processes calls bspPeer.close() finally.

Then yes, others will be hanged at the step of comparing the size of
znode and initial task size.

I wonder what happens if some task no longer need to communicate with others?

On Fri, Sep 23, 2011 at 11:59 PM, Thomas Jungblut
<[email protected]> wrote:
> Well, for SSSP example it might be correct.
> But you faced the hanging problems in randbench, too.
>
> Moreover, we have to implement our own mechanisms for high availability if
>> we have own sync master server.
>>
>
> +1
>
> 2011/9/23 Edward J. Yoon <[email protected]>
>
>> As I mentioned before, it's not a ZK problem.
>>
>> Moreover, we have to implement our own mechanisms for high availability if
>> we have own sync master server.
>>
>> Sent from my iPad
>>
>> On Sep 23, 2011, at 11:01 PM, Thomas Jungblut <
>> [email protected]> wrote:
>>
>> > I have made a github for that:
>> > https://github.com/thomasjungblut/barriersync
>> >
>> > Check it out into your eclipse (the root directory failed for whatever
>> > reason).
>> > Start the server and then the clientemulator.
>> > Works like a real charm.
>> >
>> > Please consider this as an alternative. We should not roll out a 4.0
>> release
>> > with a not working barrier sync.
>> >
>> > 2011/9/23 Thomas Jungblut <[email protected]>
>> >
>> >> Won't much different.
>> >>>
>> >>
>> >> Let's see.
>> >>
>> >> 2011/9/23 Edward J. Yoon <[email protected]>
>> >>
>> >>> What happens if some task no longer need to communicate with others?
>> >>>
>> >>> I didn't look at the code recently but I guess that the problem is
>> >>> related with comparison of znode size and task size.
>> >>>
>> >>>> I am going to write a RPC barrier sync. Zookeeper sucks in this case.
>> >>>
>> >>> Won't much different. Let's focusing on NG integration and In/Output
>> >>> system.
>> >>>
>> >>> On Fri, Sep 23, 2011 at 8:21 PM, Thomas Jungblut
>> >>> <[email protected]> wrote:
>> >>>> I am going to write a RPC barrier sync. Zookeeper sucks in this case.
>> >>>>
>> >>>> 2011/9/23 Edward J. Yoon <[email protected]>
>> >>>>
>> >>>>> P.S., Tested on 16 nodes using 10 tasks per node.
>> >>>>>
>> >>>>> On Fri, Sep 23, 2011 at 7:19 PM, Edward J. Yoon <
>> [email protected]
>> >>>>
>> >>>>> wrote:
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> Today I ran the sssp example with 4GB sample file.
>> >>>>>>
>> >>>>>> At 32th step, some tasks are finished and others hang forever.
>> >>>>>>
>> >>>>>> Could anyone figure out this problem?
>> >>>>>>
>> >>>>>> Plus, there're too many INFO-level logs. Let's reduce them.
>> >>>>>>
>> >>>>>> Thanks.
>> >>>>>>
>> >>>>>> --
>> >>>>>> Best Regards, Edward J. Yoon
>> >>>>>> @eddieyoon
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Best Regards, Edward J. Yoon
>> >>>>> @eddieyoon
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Thomas Jungblut
>> >>>> Berlin
>> >>>>
>> >>>> mobile: 0170-3081070
>> >>>>
>> >>>> business: [email protected]
>> >>>> private: [email protected]
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best Regards, Edward J. Yoon
>> >>> @eddieyoon
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Thomas Jungblut
>> >> Berlin
>> >>
>> >> mobile: 0170-3081070
>> >>
>> >> business: [email protected]
>> >> private: [email protected]
>> >>
>> >
>> >
>> >
>> > --
>> > Thomas Jungblut
>> > Berlin
>> >
>> > mobile: 0170-3081070
>> >
>> > business: [email protected]
>> > private: [email protected]
>>
>
>
>
> --
> Thomas Jungblut
> Berlin
>
> mobile: 0170-3081070
>
> business: [email protected]
> private: [email protected]
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Reply via email to