[ 
https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417059#comment-13417059
 ] 

Claudio Martella commented on GIRAPH-249:
-----------------------------------------

Hey guys, great work and great discussion. I'm not a particular fan of 
out-of-core graph in general, as we move more towards mapreduce's architecture 
loosing most of our identity. Anyway, still, it's very cool to have it and play 
with it, at least to understand better where and how we can win performance.
With respect to the threshold and the definition of the amount of data to keep 
in memory, i'd definitely go for an absolute size value, in terms of memory 
size (e.g. 2000MB etc.), compared to percentage, load and what not. This 
would/should be also the case for out-of-core messages. I've played before on 
heuristics to automatically spill to disk when weak references were taken by 
the GC, but it's not completely reliable. I think that developers/opteams, 
define the max Heapsize they have on their mappers and can reason better in 
terms of memory buffers sizes etc. That's already how they reason when they 
optimize memory for mapreduce. I think people can easily say: keep max 2GB of 
graph in memory and 2GB of messages, the rest spill to disk. It's easy and 
reliable. That's also the general approach used for HBase Memstore for example. 
I know that Cassandra has different heuristics, less deterministic and 
reliable, and a lot of people are complaining about lack of control and 
predictability of behavior.
What do you think?
                
> Move part of the graph out-of-core when memory is low
> -----------------------------------------------------
>
>                 Key: GIRAPH-249
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-249
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Alessandro Presta
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, 
> GIRAPH-249.patch, GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping 
> the whole graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of 
> memory, while gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate 
> issue, although the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job 
> (albeit slowly) instead of failing when the graph is too big, while still 
> encouraging memory optimizations and high-memory clusters; or restructuring 
> Giraph to be as efficient as possible in disk mode, making it almost a 
> standard way of operating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to