----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7975/#review13268 -----------------------------------------------------------
The results look good. How does the no-combiner version fare in these same benchmarks? Frankly, at this point I would just go for one combiner interface (the binary one). No need to use it in the disk-backed message store (if we have one message per vertex, there is no point in spilling to disk). The only use I can think of for the Iterable->Iterable form is when you're doing some sort of top-k computation, but I'm not sure if we would reap any benefits when the graph is sparse. Regarding the interface, I think two sensible options are: 1) M combine(M a, M b) 2) M combine(Iterable<M> messages) Why 2)? We may find that when processing a list of incoming messages, calling combine() for each of them has an overhead, whereas a loop inside combine() would be faster. However, if we also use the binary combiner on the sender (which may be a good idea if it's fast enough), all the requests will have at most one message per vertex, so then 1) would be the way to go (assuming we call the combiner each time we process a request). http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java <https://reviews.apache.org/r/7975/#comment28509> Shouldn't you change the "vertex combiner" terminology to "multi combiner" across the board? http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java <https://reviews.apache.org/r/7975/#comment28527> I find it confusing that we talk about "message store per vertex" when it isn't really a MessageStore. Also, I think here you mean "a single message" instead of "a collection of messages". http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java <https://reviews.apache.org/r/7975/#comment28530> This won't be needed if we do what I suggest in the other comment (don't run the combiner on the first message). http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java <https://reviews.apache.org/r/7975/#comment28516> Do we have evidence that this in-place version is much better than the obvious M combine(M a, M b)? I understand the idea (minimizing object instantiation and garbage-collection), just want to make sure we make informed choices when going with a less-intuitive interface. I'm also suggesting that the vertex index is not needed. I see the parallel with the key in a MapReduce Reducer, but I'm not sure how it would help here to know just the vertex id but not the value. http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java <https://reviews.apache.org/r/7975/#comment28526> We could get rid of this method by using the following logic when processing messages: - if the current message is null, just set it to the incoming one - otherwise, combine them - Alessandro Presta On Nov. 9, 2012, 1:44 a.m., Maja Kabiljo wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/7975/ > ----------------------------------------------------------- > > (Updated Nov. 9, 2012, 1:44 a.m.) > > > Review request for giraph. > > > Description > ------- > > Current combiner interface is very general, but also doesn't provide the best > performance. All the combiners we currently have are binary combiners, i.e. > they can combine two messages into one. Having a lists around this simple > concept makes it slower and requires more object creations. > Adding BinaryCombiner, and a specialized message store which will be used > with it. This message store has only one message per vertex instead of having > a collection per vertex. > > > This addresses bug GIRAPH-414. > https://issues.apache.org/jira/browse/GIRAPH-414 > > > Diffs > ----- > > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java > PRE-CREATION > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java > PRE-CREATION > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumBinaryCombiner.java > PRE-CREATION > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java > PRE-CREATION > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Combiner.java > PRE-CREATION > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java > 1406748 > > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java > 1406748 > > Diff: https://reviews.apache.org/r/7975/diff/ > > > Testing > ------- > > mvn verify > > PageRankBenchmark > 20m vertices, 100 edges per vertex, 20 workers > 1 compute thread, superstep time 55s->45s > 6 compute threads, superstep time 28s->15s > 12 compute threads, 1 netty server thread, superstep time 185s->112s > > Our internal application > Similar speedup as Page Rank > > > Thanks, > > Maja Kabiljo > >
