I was under the impression that 100k was the upper limit to make things work without crashing.
In any case, if one wanted to use a compressed memory representation by aggregating different edge lists together, could one use the worker context as a central point of access to the compressed graphs? I can imagine a vertex class that has only the ID and uses the worker context to access its edge list (i.e. it is only a client to a central per-machine repository). Vertexes in the same partition would share this data structure. Is there any obvious technical fallacy in this scheme? Cheers, -- Gianmarco On Fri, Aug 17, 2012 at 3:18 PM, Alessandro Presta <[email protected]>wrote: > The example where we actually go out of memory was with 500K vertices and > 500M edges, but yes, as a general rule we should strive to reduce our > memory footprint in order to push the point where we need to go out of > core as far away as possible. > > On 8/17/12 2:11 PM, "Gianmarco De Francisci Morales" <[email protected]> > wrote: > > >Very interesting. > > > >On a side note, a graph with 100k vertices and 100M edges is largish but > >not that big after all. > >If it does not fit on 10+ GB of memory, it means that each edge occupies > >around 100B (amortizing the cost of the vertex over the edges). > >In my opinion this deserves some thought. > >If memory is an issue, why not think about compressed memory structures, > >at > >least for common graph formats? > > > >Cheers, > >-- > >Gianmarco > > > > > > > >On Wed, Aug 15, 2012 at 11:20 PM, Eli Reisman > ><[email protected]>wrote: > > > >> Great metrics, this made a very interesting read, and great code too as > >> always. This must have been a lot of work. I like the idea of > >>eliminating > >> the extra temporary storage data structures where possible, even when > >>not > >> going out-of-core. I think that + avoiding extra object creation during > >>the > >> workflow can still do a lot for in-core job's memory profile, but this > >>is > >> looking really good and sounds like with the config options its also > >> pluggable depending on your hardware situation, so it sounds great to > >>me. > >> Great work! > >> > >> On Wed, Aug 15, 2012 at 12:23 PM, Alessandro Presta (JIRA) > >> <[email protected]>wrote: > >> > >> > > >> > [ > >> > > >> > >>https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira > . > >>plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435437#c > >>omment-13435437 > >> ] > >> > > >> > Alessandro Presta commented on GIRAPH-249: > >> > ------------------------------------------ > >> > > >> > Thanks Claudio, good observation. > >> > You got me curious so I quickly ran a shortest paths benchmark. > >> > > >> > 500k vertices, 100 edges/vertex, 10 workers > >> > > >> > This is with trunk: > >> > > >> > {code} > >> > hadoop jar giraph-trunk.jar > >> > org.apache.giraph.benchmark.ShortestPathsBenchmark > >>-Dgiraph.useNetty=true > >> > -v -V 500000 -e 100 -w 10 > >> > {code} > >> > > >> > {code} > >> > Superstep 3 (milliseconds) 5,394 > >> > Superstep 4 (milliseconds) 5,650 > >> > Superstep 23 (milliseconds) 1,100 > >> > Superstep 20 (milliseconds) 1,108 > >> > Superstep 31 (milliseconds) 1,192 > >> > Setup (milliseconds) 3,297 > >> > Shutdown (milliseconds) 2,309 > >> > Superstep 27 (milliseconds) 1,135 > >> > Superstep 7 (milliseconds) 4,476 > >> > Superstep 9 (milliseconds) 3,400 > >> > Superstep 8 (milliseconds) 4,043 > >> > Superstep 16 (milliseconds) 1,184 > >> > Superstep 14 (milliseconds) 2,251 > >> > Superstep 6 (milliseconds) 5,150 > >> > Superstep 24 (milliseconds) 1,167 > >> > Superstep 18 (milliseconds) 1,176 > >> > Superstep 5 (milliseconds) 5,483 > >> > Superstep 1 (milliseconds) 1,125 > >> > Superstep 21 (milliseconds) 1,192 > >> > Total (milliseconds) 85,757 > >> > Superstep 15 (milliseconds) 1,375 > >> > Superstep 22 (milliseconds) 1,159 > >> > Vertex input superstep (milliseconds) 11,644 > >> > Superstep 25 (milliseconds) 1,058 > >> > Superstep 17 (milliseconds) 1,075 > >> > Superstep 26 (milliseconds) 1,051 > >> > Superstep 12 (milliseconds) 2,342 > >> > Superstep 10 (milliseconds) 3,192 > >> > Superstep 19 (milliseconds) 1,092 > >> > Superstep 11 (milliseconds) 2,533 > >> > Superstep 30 (milliseconds) 1,126 > >> > Superstep 0 (milliseconds) 821 > >> > Superstep 28 (milliseconds) 1,184 > >> > Superstep 29 (milliseconds) 1,116 > >> > Superstep 2 (milliseconds) 1,165 > >> > Superstep 13 (milliseconds) 1,983 > >> > {code} > >> > > >> > And with 5 partitions out-of-core: > >> > > >> > {code} > >> > hadoop jar giraph-249.jar > >> > org.apache.giraph.benchmark.ShortestPathsBenchmark > >>-Dgiraph.useNetty=true > >> > -Dgiraph.useOutOfCoreGraph=true -Dgiraph.maxPartitionsInMemory=5 -v -V > >> > 500000 -e 100 -w 10 > >> > {code} > >> > > >> > {code} > >> > Superstep 3 (milliseconds) 27,407 > >> > Superstep 4 (milliseconds) 26,620 > >> > Superstep 23 (milliseconds) 20,906 > >> > Superstep 20 (milliseconds) 21,324 > >> > Superstep 31 (milliseconds) 21,055 > >> > Setup (milliseconds) 2,639 > >> > Superstep 7 (milliseconds) 25,819 > >> > Superstep 27 (milliseconds) 20,790 > >> > Shutdown (milliseconds) 175 > >> > Superstep 16 (milliseconds) 21,434 > >> > Superstep 8 (milliseconds) 24,434 > >> > Superstep 9 (milliseconds) 24,183 > >> > Superstep 14 (milliseconds) 22,401 > >> > Superstep 6 (milliseconds) 25,948 > >> > Superstep 24 (milliseconds) 20,968 > >> > Superstep 18 (milliseconds) 21,179 > >> > Superstep 5 (milliseconds) 27,134 > >> > Superstep 1 (milliseconds) 20,315 > >> > Superstep 21 (milliseconds) 21,442 > >> > Total (milliseconds) 729,459 > >> > Superstep 15 (milliseconds) 22,198 > >> > Superstep 22 (milliseconds) 20,875 > >> > Vertex input superstep (milliseconds) 19,595 > >> > Superstep 25 (milliseconds) 20,829 > >> > Superstep 17 (milliseconds) 21,617 > >> > Superstep 12 (milliseconds) 22,548 > >> > Superstep 26 (milliseconds) 20,763 > >> > Superstep 19 (milliseconds) 21,302 > >> > Superstep 10 (milliseconds) 23,823 > >> > Superstep 11 (milliseconds) 22,908 > >> > Superstep 0 (milliseconds) 10,836 > >> > Superstep 30 (milliseconds) 21,014 > >> > Superstep 28 (milliseconds) 21,109 > >> > Superstep 29 (milliseconds) 21,158 > >> > Superstep 13 (milliseconds) 21,974 > >> > Superstep 2 (milliseconds) 20,726 > >> > {code} > >> > > >> > So yeah, this is more like an order of magnitude. Of course this is > >> > nothing scientific. > >> > > >> > > Move part of the graph out-of-core when memory is low > >> > > ----------------------------------------------------- > >> > > > >> > > Key: GIRAPH-249 > >> > > URL: > >>https://issues.apache.org/jira/browse/GIRAPH-249 > >> > > Project: Giraph > >> > > Issue Type: Improvement > >> > > Reporter: Alessandro Presta > >> > > Assignee: Alessandro Presta > >> > > Attachments: GIRAPH-249.patch, GIRAPH-249.patch, > >> > GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch > >> > > > >> > > > >> > > There has been some talk about Giraph's scaling limitations due to > >> > keeping the whole graph and messages in RAM. > >> > > We need to investigate methods to fall back to disk when running > >>out of > >> > memory, while gracefully degrading performance. > >> > > This issue is for graph storage. Messages should probably be a > >>separate > >> > issue, although the interplay between the two is crucial. > >> > > We should also discuss what are our primary goals here: completing a > >> job > >> > (albeit slowly) instead of failing when the graph is too big, while > >>still > >> > encouraging memory optimizations and high-memory clusters; or > >> restructuring > >> > Giraph to be as efficient as possible in disk mode, making it almost a > >> > standard way of operating. > >> > > >> > -- > >> > This message is automatically generated by JIRA. > >> > If you think it was sent incorrectly, please contact your JIRA > >> > administrators: > >> > > >>https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > >> > For more information on JIRA, see: > >> http://www.atlassian.com/software/jira > >> > > >> > > >> > > >> > >
