I'll try to have a look at the proposal from a performance point of view in the next days. Please ping me, if I don't follow up this thread.
Cheers, Fabian 2015-10-27 18:28 GMT+01:00 Martin Junghanns <m.jungha...@mailbox.org>: > Hi, > > At our group, we also moved several algorithms from Giraph to Gelly and > ran into some confusing issues (first in understanding, second during > implementation) caused by the conceptional differences you described. > > If there are no concrete advantages (performance mainly) in the Spargel > implementation, we would be very happy to see the Gelly API be aligned to > Pregel-like systems. > > Your SSSP example speaks for itself. Straightforward, if the reader is > familiar with Pregel/Giraph/... > > Best, > Martin > > > On 27.10.2015 17:40, Vasiliki Kalavri wrote: > >> Hello squirrels, >> >> I want to discuss with you a few concerns I have about our current >> vertex-centric model implementation, Spargel, now fully subsumed by Gelly. >> >> Spargel is our implementation of Pregel [1], but it violates some >> fundamental properties of the model, as described in the paper and as >> implemented in e.g. Giraph, GPS, Hama. I often find myself confused both >> when trying to explain it to current Giraph users and when porting my >> Giraph algorithms to it. >> >> More specifically: >> - in the Pregel model, messages produced in superstep n, are received in >> superstep n+1. In Spargel, they are produced and consumed in the same >> iteration. >> - in Pregel, vertices are active during a superstep, if they have received >> a message in the previous superstep. In Spargel, a vertex is active during >> a superstep if it has changed its value. >> >> These two differences require a lot of rethinking when porting >> applications >> and can easily cause bugs. >> >> The most important problem however is that we require the user to split >> the >> computation in 2 phases (2 UDFs): >> - messaging: has access to the vertex state and can produce messages >> - update: has access to incoming messages and can update the vertex value >> >> Pregel/Giraph only expose one UDF to the user: >> - compute: has access to both the vertex state and the incoming messages, >> can produce messages and update the vertex value. >> >> This might not seem like a big deal, but except from forcing the user to >> split their program logic into 2 phases, Spargel also makes some common >> computation patterns non-intuitive or impossible to write. A very simple >> example is propagating a message based on its value or sender ID. To do >> this with Spargel, one has to store all the incoming messages in the >> vertex >> value (might be of different type btw) during the messaging phase, so that >> they can be accessed during the update phase. >> >> So, my first question is, when implementing Spargel, were other >> alternatives considered and maybe rejected in favor of performance or >> because of some other reason? If someone knows, I would love to hear about >> them! >> >> Second, I wrote a prototype implementation [2] that only exposes one UDF, >> compute(), by keeping the vertex state in the solution set and the >> messages >> in the workset. This way all previously mentioned limitations go away and >> the API (see "SSSPComputeFunction" in the example [3]) looks a lot more >> like Giraph (see [4]). >> >> I have not run any experiments yet and the prototype has some ugly hacks, >> but if you think any of this makes sense, then I'd be willing to follow up >> and try to optimize it. If we see that it performs well, we can consider >> either replacing Spargel or adding it as an alternative. >> >> Thanks for reading this long e-mail and looking forward to your input! >> >> Cheers, >> -Vasia. >> >> [1]: https://kowshik.github.io/JPregel/pregel_paper.pdf >> [2]: >> >> https://github.com/vasia/flink/tree/spargel-2/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/spargelnew >> [3]: >> >> https://github.com/vasia/flink/blob/spargel-2/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/spargelnew/example/SSSPCompute.java >> [4]: >> >> https://github.com/grafos-ml/okapi/blob/master/src/main/java/ml/grafos/okapi/graphs/SingleSourceShortestPaths.java >> >>