[
https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414729#comment-13414729
]
Eli Reisman commented on GIRAPH-249:
------------------------------------
Yes, I had to drop GIRAPH-247 and start GIRAPH-256 because of some confusion I
had over the naming scheme (or lack thereof) in that same path from InputSplit
to final loading of vertices into the workerPartitionMap. From what I can tell,
the vertices are stored in temp partitions in BspServiceWorker during load from
input splits, and are sent in bursts over netty to their final homes (even when
that is on the local worker, will file a separate jira/patch for this monday I
think) and this takes advantage of a single code path to get vertices to their
new owners whether it is INPUT_SUPERSTEP or a dynamic repartitioning during
calculation super steps. When the collections of vertices arrives at their new
home they are again placed in a Partition container (might file jira to
eliminate all the middleman containers for memory saving as well, since they
are not sent on the wire in Partitions) and then combined into existing
partitions in workerPartitionMap using the correct partition ID from a list of
ID's that partition is the owner of according to info the master gives it at
the beginning of each super step.
As to outgoing partitions when they are first taken from InputSplit, they are
stored in TEMPORARY (this fooled me on GIRAPH-247 for a while) Parition objects
in inputSplitCache or something like that, and only sent out on the wire when
they are a reasonable size to their real home. this is what GIRAPH-256
improves, and have had VERY good results testing it this weekend. Anyway the
inputSplitCache for that Partition object is cleared for refilling as the
vertexReader continues, and the vertices are sent over the wire as a
Collection<BasicVertex> which is also confusing since the request type is
SendPartitionRequest or something like that.
So: no "real Partitions" are created or used for anything but temp storage
until vertices arrive at the worker who will actually host them.
I can test your patch monday with metrics. Thing to remember though is this: it
might be a hard sell to introduce disk caching at all stages of the
computation, because Giraph is based on Pregel which is strictly in-memory
processing. Once we leave behind the Pregel model entirely we might be in
danger of "feature creep" that could make it hard to sell new adopters on
Giraph, or even to explain to them how it works and why its a correct approach.
I think being able to claim we are a "BSP implementation following the Pregel
model" will help adoption long term just as Hadoop can claim to be a "MapReduce
implementation with distributed filesystem based on GFS" and everyone knew just
what that was and was not.
Either way, great work, this is a lot to take on! Thanks again!
> Move part of the graph out-of-core when memory is low
> -----------------------------------------------------
>
> Key: GIRAPH-249
> URL: https://issues.apache.org/jira/browse/GIRAPH-249
> Project: Giraph
> Issue Type: Improvement
> Reporter: Alessandro Presta
> Assignee: Alessandro Presta
> Attachments: GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch,
> GIRAPH-249.patch, GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping
> the whole graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of
> memory, while gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate
> issue, although the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job
> (albeit slowly) instead of failing when the graph is too big, while still
> encouraging memory optimizations and high-memory clusters; or restructuring
> Giraph to be as efficient as possible in disk mode, making it almost a
> standard way of operating.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira