[
https://issues.apache.org/jira/browse/GIRAPH-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eli Reisman updated GIRAPH-246:
-------------------------------
Attachment: GIRAPH-246-10.patch
Here's the one jakob asked for, keeps the fixes in BSWorker and leaves the rest
essentially alone. This passes mvn verify but I have not run this particular
copy on the cluster yet.
The fixes are identical to the working version I used this weekend, but nothing
about this problem but ground truth makes me feel good any more.
> Periodic worker calls to context.progress() will prevent timeout on some
> Hadoop clusters during barrier waits
> -------------------------------------------------------------------------------------------------------------
>
> Key: GIRAPH-246
> URL: https://issues.apache.org/jira/browse/GIRAPH-246
> Project: Giraph
> Issue Type: Improvement
> Components: bsp
> Affects Versions: 0.2.0
> Reporter: Eli Reisman
> Assignee: Eli Reisman
> Priority: Minor
> Labels: hadoop, patch
> Fix For: 0.2.0
>
> Attachments: GIRAPH-246-1.patch, GIRAPH-246-10.patch,
> GIRAPH-246-2.patch, GIRAPH-246-3.patch, GIRAPH-246-4.patch,
> GIRAPH-246-5.patch, GIRAPH-246-6.patch, GIRAPH-246-7.patch,
> GIRAPH-246-8.patch, GIRAPH-246-9.patch
>
>
> This simple change creates a command-line configurable option in GiraphJob to
> control the time between calls to context().progress() that allows workers to
> avoid timeouts during long data load-ins in which some works complete their
> input split reads much faster than others, or finish a super step faster. I
> found this allowed jobs that were large-scale but with low memory overhead to
> complete even when they would previously time out during runs on a Hadoop
> cluster. Timeout is still possible when the worker crashes or runs out of
> memory or has other GC or RPC trouble that is legitimate, but prevents
> unintentional crashes when the worker is actually still healthy.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira