[jira] [Commented] (GIRAPH-249) Move part of the graph out-of-core when memory is low

Eli Reisman (JIRA) Mon, 16 Jul 2012 09:25:36 -0700

    [ 
https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415352#comment-13415352
 ]


Eli Reisman commented on GIRAPH-249:
------------------------------------

That makes a lot of sense. I guess the key differentiator there is "do we ever 
spill/reload partitions during active computation, or just at the beginning/end 
of a super step?"

Seems like if we do any of that, the fundamental Pregel basis has changed. so I 
would say unless we have the graph in-memory while processing happens during 
each cycle of compute() calls in a super step, we have changed our model. Thats 
not a bad thing, but it does represent a pretty bold change.

Have you taken a look at GraphChi? They have a fascinating method for disk 
caching of large graphs used during the processing phase. Again, this model 
would require radical change, but it sure looks nice on paper.

My concern (I'll check you patch now btw) is that our metrics have clearly and 
repeatedly shown that INPUT_SUPERSTEP is where the memory problems occur. So if 
we target that, problem might be solved already, or at least our scale greatly 
increased with a smaller change to the codebase.

I am shocked about the combiner thing, I thought that was already done, nice 
observation! By using 256, 250, and 246 this weekend I was able to dramatically 
increase throughput without crashing (in fact, there were no crashes and I ran 
out of test data!) on large data jobs at the same resource profile I have been 
using all along.

What I have concluded from these tests is Netty messages (# of them and size of 
them in bytes) during INPUT_SUPERSTEP and the buffering of ever-larger 
partition data incoming to each worker as the inputs are read are the two major 
memory pain points. Tuning these factors and streamlining memory use during the 
transfer of vertices from the INputSplit worker who read them to the worker who 
owns that partition initially could potentially solve this problem entirely, or 
at least let it scale out dramatically further.

i will test your disk patch now, and let you know how it goes. Thanks for all 
your hard work and great ideas, its exciting to see this all shaping up so 
nicely!

                
> Move part of the graph out-of-core when memory is low
> -----------------------------------------------------
>
>                 Key: GIRAPH-249
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-249
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Alessandro Presta
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, 
> GIRAPH-249.patch, GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping 
> the whole graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of 
> memory, while gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate 
> issue, although the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job 
> (albeit slowly) instead of failing when the graph is too big, while still 
> encouraging memory optimizations and high-memory clusters; or restructuring 
> Giraph to be as efficient as possible in disk mode, making it almost a 
> standard way of operating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-249) Move part of the graph out-of-core when memory is low

Reply via email to