[
https://issues.apache.org/jira/browse/GIRAPH-274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jaeho Shin updated GIRAPH-274:
------------------------------
Attachment: GIRAPH-274.patch
Here is our patch that adds several progress() calls after a careful code
review with Greg Malewicz. It seems missing progress() call from
BspServiceWorker#reserveInputSplit() was causing the timeout for idle workers
during the INPUT_SUPERSTEP. There were many more spots where Giraph is doing a
blocking call, but we left comments due to either not having access to the
Context or the source code. This seems to be an endless effort and it'll only
pollute Giraph's codebase as we try to fix more timeout cases.
We definitely need a better systematic way to keep our Giraph jobs from timing
out. One possibility is to run a separate thread from GraphMapper#run() which
reports progress as long as the task don't crash, and stop worry about calling
progress(). Do you think this is a good idea? Will this cause any trouble in
the underlying Hadoop/MapReduce stack? If we're using MapReduce only for
scheduling resources, then I believe there should be no reason for us to
conform to MapReduce conventions of not using threads.
> Jobs still failing due to tasks timeout during INPUT_SUPERSTEP
> --------------------------------------------------------------
>
> Key: GIRAPH-274
> URL: https://issues.apache.org/jira/browse/GIRAPH-274
> Project: Giraph
> Issue Type: Bug
> Affects Versions: 0.2.0
> Reporter: Jaeho Shin
> Assignee: Jaeho Shin
> Fix For: 0.2.0
>
> Attachments: GIRAPH-274.patch
>
>
> Even after GIRAPH-267, jobs were failing during INPUT_SUPERSTEP when some
> workers don't get to reserve an input split, while others were loading
> vertices for a long time. (related to GIRAPH-246 and GIRAPH-267)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira