[
https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417059#comment-13417059
]
Claudio Martella commented on GIRAPH-249:
-----------------------------------------
Hey guys, great work and great discussion. I'm not a particular fan of
out-of-core graph in general, as we move more towards mapreduce's architecture
loosing most of our identity. Anyway, still, it's very cool to have it and play
with it, at least to understand better where and how we can win performance.
With respect to the threshold and the definition of the amount of data to keep
in memory, i'd definitely go for an absolute size value, in terms of memory
size (e.g. 2000MB etc.), compared to percentage, load and what not. This
would/should be also the case for out-of-core messages. I've played before on
heuristics to automatically spill to disk when weak references were taken by
the GC, but it's not completely reliable. I think that developers/opteams,
define the max Heapsize they have on their mappers and can reason better in
terms of memory buffers sizes etc. That's already how they reason when they
optimize memory for mapreduce. I think people can easily say: keep max 2GB of
graph in memory and 2GB of messages, the rest spill to disk. It's easy and
reliable. That's also the general approach used for HBase Memstore for example.
I know that Cassandra has different heuristics, less deterministic and
reliable, and a lot of people are complaining about lack of control and
predictability of behavior.
What do you think?
> Move part of the graph out-of-core when memory is low
> -----------------------------------------------------
>
> Key: GIRAPH-249
> URL: https://issues.apache.org/jira/browse/GIRAPH-249
> Project: Giraph
> Issue Type: Improvement
> Reporter: Alessandro Presta
> Assignee: Alessandro Presta
> Attachments: GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch,
> GIRAPH-249.patch, GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping
> the whole graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of
> memory, while gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate
> issue, although the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job
> (albeit slowly) instead of failing when the graph is too big, while still
> encouraging memory optimizations and high-memory clusters; or restructuring
> Giraph to be as efficient as possible in disk mode, making it almost a
> standard way of operating.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira