[
https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417085#comment-13417085
]
Claudio Martella commented on GIRAPH-249:
-----------------------------------------
Just to be clear, i'm cautious mostly with out-of-core graph, not with
out-of-core messages. The first one is a first citizen of the framework, it
represents state and structure, and affects most of the code path.
Anyway, estimating the size of an object is not a very easy task in java, in
particular with Messages which are user-defined and can be composed of multiple
objects. For this reason i think we have two approaches:
1) we ask the Messages to implement a sizeOf() method, following the approach
of: http://www.javaworld.com/javaworld/javaqa/2003-12/02-qa-1226-sizeof.html
2) we keep the Messages in serialized format, which we can calculate the size
of easily.
Wrt to (2), one of the things we discussed at the last workshop in berlin, and
that was suggested by Owen, and that was also attacked by GPS Stanford, is that
the pressure on the GC is quite a big loss in performance for continuous object
creation. Mapreduce re-uses objects, GPS and Stratosphere keep the data in
serialized format in side of byte[]. It's not something for this JIRA, but it
could be a nice moment to actually start the appropriate ticket and discussion
elsewhere as the two things go together. I'll do that.
> Move part of the graph out-of-core when memory is low
> -----------------------------------------------------
>
> Key: GIRAPH-249
> URL: https://issues.apache.org/jira/browse/GIRAPH-249
> Project: Giraph
> Issue Type: Improvement
> Reporter: Alessandro Presta
> Assignee: Alessandro Presta
> Attachments: GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch,
> GIRAPH-249.patch, GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping
> the whole graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of
> memory, while gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate
> issue, although the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job
> (albeit slowly) instead of failing when the graph is too big, while still
> encouraging memory optimizations and high-memory clusters; or restructuring
> Giraph to be as efficient as possible in disk mode, making it almost a
> standard way of operating.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira