Do we have log and some detail such as how it is executed (or some custom debug 
output for figuring out its execution order), etc.? 

In addition, It might help check other settings e.g. if /tmp is full. 
Previously I encountered job hanging forever issue, but that's because the /tmp 
size is too small. Cleaning /tmp or switching temp dir where bsp writes to 
fixes the problem. 

But it may be also the flaw in code (it would be good if anyone can help review 
the patch that may spot some issues that I did not notice) or if in the case 
that double barrier in fact does not work, then we need to switch using e.g. 
polling that Vinod suggests. 

-----Original message-----
From:Edward J. Yoon <[email protected]>
To:[email protected]
Date:Fri, 23 Sep 2011 19:20:58 +0900
Subject:Re: Hang problem

P.S., Tested on 16 nodes using 10 tasks per node.

On Fri, Sep 23, 2011 at 7:19 PM, Edward J. Yoon <[email protected]> wrote:
> Hi,
>
> Today I ran the sssp example with 4GB sample file.
>
> At 32th step, some tasks are finished and others hang forever.
>
> Could anyone figure out this problem?
>
> Plus, there're too many INFO-level logs. Let's reduce them.
>
> Thanks.
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon


--
ChiaHung Lin
Department of Information Management
National University of Kaohsiung
Taiwan

Reply via email to