The general idea of combiners is to reduce the number of messages sent. Combiners are purely an optimization and the application should work correctly without it (since it's never guaranteed to actually be called). Combiners can only modify the messages sent to a single vertex, so they can't send messages to other vertices. Any other work (i.e. sending messages) should be done by the vertex in the compute() method.

While I think that grouping behavior could actually be implemented within a message object (still reducing the number of messages to 1 or 0) I suppose that in some simple cases (i.e. grouping), it might be easier by doing it in the combiner as you both have mentioned? The only thing I suppose I'm concerned about is letting users do something that is not optimal. Generally, expanding messages is not what you want your combiner to do. Also, since grouping behavior can be implemented in the message object, it forces users to avoid shooting themselves in the foot.

Good discussion (it's making me really think about this)!


On 1/10/12 10:32 AM, Claudio Martella wrote:
Ok, now i see where you're going. I guess that the thing here is that
the combiner would "act" like (on its behalf) D, and to do so
concretely it would probably need some local data related to D (edges
values? vertexvalue?).
I also think that k>  n is also possible in principle and we could let
the user decide whether to use this power or not, once/if we agree
that letting the user send k messages in the combiner is useful (and
the grouping behavior shown by the label propagation example should do

On Tue, Jan 10, 2012 at 7:04 PM, Jakob Homan<>  wrote:
Those two messages would have gone to D, been expanded to, say, 4,
which would have then then been sent to, say, M.  This would save the
sending of the two to D and send the 4 directly to M.  I'm not saying
it's a great example, but it is legal.  This is of course assuming
that combiners can generate messages bound for vertices other than the
original destination, which I don't know if that has even been

On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
<>  wrote:
i'm not sure i understand what you'd save here. if the two messages
were going to be expanded to k messages on the destination worker D,
but you expand them on W, you end up sending k messages instead of 2.

On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan<>  wrote:
it doesn't have to be expand, k, the number of elements returned by
the combiner, can still be smaller than n,
Right.  Grouping would be the most common case.  It would be possible
to be great than k, as well.  For instance, consider two messages,
both generated on the same worker (W) by two two different vertices,
both bound for another vertex, Z.  A combiner on W could get both of
these messages, do some work on them, as it would have knowledge of
both, and generate some arbitrary number of messages bound for other
vertices (thus saving the shuffle/transfer of the original messages).

On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
<>  wrote:
it doesn't have to be expand, k, the number of elements returned by
the combiner, can still be smaller than n, the size of the messages
parameter. as a first example, you can imagine your vertex receiving
semantically-different classes/types of messages, and you can imagine
willing to be summarizing them in different messages, i.e. if your
messages come along with labels or just simply by the source vertex,
if required by the algorithm, think of label propagation to have just
an example, or some sort of labeled-pagerank.

On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<>  wrote:
I agree that C&A doesn't require it, however, I can't think of why I would
want to use a combiner to expand the number of messages.  Can you?


On 1/9/12 3:57 PM, Jakob Homan wrote:
In my opinion that means reducing to a single message or none at all.
C&A doesn't require this, however.  Hadoop's combiner interface, for
instance, doesn't require a single  or no value to be returned; it has
the same interface as a reducer, zero or more values.  Would adapting
the semantics of Giraph's combiner to return a list of messages
(possibly empty) make it more useful?

On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
<>    wrote:
Yes, what is you say is completely reasonable, you convinced me :)

On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<>    wrote:
Combiners should be commutative and associative.  In my opinion that
reducing to a single message or none at all.  Can you think of a case
more than 1 message should be returned from a combiner?  I know that
returning null isn't preferable in general, but I think that
(returning no messages), is nice to have and isn't a huge amount of work
our side.


On 1/9/12 12:13 PM, Claudio Martella wrote:
To clarify, I was not discussing the possibility for combine to return
null. I see why it would be useful, given that combine returns M,
there's no other way to let combiner ask not to send any message,
although i agree with Jakob, I also believe returning null should be
avoided but only used, roughly, as an init value for a
Perhaps, we could, but i'm just thinking out loud here, let combine()
return Iterable<M>, basicallly letting it define what to combine to
({0, 1, k } messages). It would be a powerful extension to the model,
but maybe it's too much.

As far as the size of the messages parameter, I agree with you that 0
messages gives nothing to combine and it would be somehow awkward, it
was more a matter of synching it with the other methods getting the
messages parameter.
Probably, having a more clear javadoc will do the job here.

What do you think?

On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<>
I'm not a big fan of returning null as it adds extra complexity to the
calling code (null checks, or not, since people usually will forget
them).  Avery is correct that combiners are application specific.  Is
it conceivable that one would want to write a combiner that returned
something for an input of no parameters, ie combining the empty list
doesn't return the empty list?  I imagine for most combiners,
combining a single message would result in that message.

On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<>
The javadoc for VertexCombiner#combine() is

   * Combines message values for a particular vertex index.
   * @param vertexIndex Index of the vertex getting these messages
   * @param msgList List of the messages to be combined
   * @return Message that is combined from {@link MsgList} or null if
   *         message it to be sent
   * @throws IOException

I think we are somewhat vague on what a combiner can return to
various use cases.  A combiner should be particular to a particular
compute() algorithm.  I think it should be legal to return null from
combiner, in that case, no message should be sent to that vertex.

It seems like it would be an overhead to call a combiner when there
messages.  I can't see a case where that would be useful.  Perhaps we
change the javadoc to insure that msgList must contain at least one
to have combine() being called.


On 1/9/12 5:37 AM, Claudio Martella wrote:
Hi Sebastian,

yes, that was my point, I agree completely with you.
Fixing my test was not the issue, my question was whether we want to
define explicitly the semantics of this scenario.
Personally, I believe the combiner should be ready to receive 0
messages, as it's the case of BasicVertex::initialize(),
and compute(), and act accordingly.

In the particular example, I believe the SimpleSumCombiner is
It's true that the sum of no values is 0, but it's also true that
null return semantics of combine() is more suitable for this exact

On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<>
I think we currently implicitly assume that there is at least one
element in the Iterable passed to the combiner. The messaging code
invokes the combiner only if at least one message for the target
has been sent.

However, we should not rely on implicit implementation details but
explicitly specify the semantics of combiners.


On 09.01.2012 13:29, Claudio Martella wrote:
Hello list,

for GIRAPH-45 I'm touching the incoming messages and hit an
interesting problem with the combiner semantics.
currently, my code fails testBspCombiner for the following reason:

SimpleSumCombiner::compute() returns a value even if there are no
messages in the iterator (in this case it returns 0) and for this
reason the vertices get activated at each superstep.

At each superstep, under-the-hood, I pass the combiner for each
an Iterable, which can be empty:

     public Iterable<M>        getMessages(I vertexId) {
       Iterable<M>        messages =
       if (combiner != null) {
               M combinedMsg;
               try {
                       combinedMsg = combiner.combine(vertexId,
               }  catch (IOException e) {
                       throw new RuntimeException("could not
               if (combinedMsg != null) {
                       List<M>        tmp = new ArrayList<M>(1);
                       messages = tmp;
               } else {
                       messages = new ArrayList<M>(0);
       return messages;

the Iterable returned by this methods is passed to
basicVertex.putMessages() right before the compute().
Now, the question is: who's wrong? The combiner code that returns
sum of 0 over no values, or the framework that calls the combiner
0 messages?

    Claudio Martella

    Claudio Martella

    Claudio Martella

Reply via email to