[
https://issues.apache.org/jira/browse/GIRAPH-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eli Reisman updated GIRAPH-307:
-------------------------------
Attachment: GIRAPH-307-1.patch
This also attempts to re-use a single LocalityInfoSorter by making it the
repository for the input split list until all splits have been read and the
worker returns "null" from reserveInputSplit()
passes mvn verify, will test on cluster ASAP and report back results.
> InputSplit list can be long with many workers (and locality info) and should
> not be re-created every time a worker calls reserveInputSplit()
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: GIRAPH-307
> URL: https://issues.apache.org/jira/browse/GIRAPH-307
> Project: Giraph
> Issue Type: Improvement
> Components: bsp, graph
> Affects Versions: 0.2.0
> Reporter: Eli Reisman
> Assignee: Eli Reisman
> Priority: Minor
> Fix For: 0.2.0
>
> Attachments: GIRAPH-307-1.patch
>
>
> While instrumenting the INPUT_SUPERSTEP and watching various runs, I see the
> input split list generated every time a worker calls reserveInputSplit is,
> for all intents and purposes, immutable per job. Therefore, we can save a
> fair amount of memory by not re-creating the list and re-querying ZooKeeper
> on each pass to claim another split. Only the reserved and finished children
> lists are ever mutated during the input phase of the job.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira