[ 
https://issues.apache.org/jira/browse/GIRAPH-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423143#comment-13423143
 ] 

Eli Reisman commented on GIRAPH-267:
------------------------------------

Sorry, I'll explain better, I was in the middle of two conversations at once 
last night! I meant the other patch didn't publish context into new parts of 
the code, or put the context inside waitMsec inside waitForever. It kept the 
fix localized to BspServiceWorker during INPUT_SUPERSTEP as this was where the 
problem happened, no matter how much data I shoveled in at the beginning. I 
explicitly called waitMsec there so all the progress calls were in one place 
and you could see where and how often they were needed. I had been asked 
repeatedly why progress calls were needed at all, and it was hard to understand 
the need or placement of the calls for folks who had not run into this problem 
yet. Once the load in was done, I never saw supersteps take very long so it 
seemed self contained. I'm surprised no committers left me a comment if that 
solution was unsavory, I agree this cuts down on repetition in the code! This 
is a great fix, nice work Jaeho!

                
> Jobs can get killed for not reporting status during INPUT SUPERSTEP
> -------------------------------------------------------------------
>
>                 Key: GIRAPH-267
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-267
>             Project: Giraph
>          Issue Type: Bug
>          Components: graph
>    Affects Versions: 0.2.0
>         Environment: Facebook Hadoop
>            Reporter: Jaeho Shin
>            Assignee: Jaeho Shin
>             Fix For: 0.2.0
>
>         Attachments: 
> 0001-Made-PredicateLock-report-progress-and-removed-Conte.patch, 
> GIRAPH-267.patch, GIRAPH-267.patch
>
>
> Job with a skewed and long (>600secs in my case) INPUT_SUPERSTEP fails for 
> some tasks not reporting their status.  From BspServiceWorker#setup(), I 
> could tell while some workers were still loading inputSplits, others finished 
> theirs early and hanged on PredicateLock#waitForever(), and got killed after 
> the timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to