Ok, i'll open a ticket with this issue. As the discussion about
warning vs exception seams still open, we can move it to JIRA where a
decision can be taken.

On Fri, Jan 13, 2012 at 10:24 PM, Jakob Homan <jgho...@gmail.com> wrote:
> I think this is good, but really think a warning is enough, rather
> than an exception.  There's no reason to pre-emptively
>
> On Fri, Jan 13, 2012 at 1:03 PM, Sebastian Schelter <s...@apache.org> wrote:
>> +1 on Iterable <= messages.size() also from me.
>>
>>
>> On 13.01.2012 19:51, Avery Ching wrote:
>>> +1
>>>
>>> I'm fine with this.  If we agree to return an Iterable, then we should
>>> make sure to either throw if the size of the Iterable > messages.size()
>>> to at the very least LOG.warn("This combiner is likely to be implemented
>>> wrong").  I prefer an exception, since we have no use case for expanding
>>> the set of messages.
>>>
>>> Also, I'd like to have something in the javadoc saying something like
>>> "While the number of messages returned can be equal to the same number
>>> of messages that was inputted, the purpose of the combiner is to reduced
>>> the number of messages from the input."
>>>
>>> Avery
>>>
>>> On 1/13/12 9:34 AM, Claudio Martella wrote:
>>>> Ok,
>>>>
>>>> I guess we can vote then about this, what do you think?
>>>> Shall we take 72h?
>>>>
>>>> I'm +1 for returning an iterable that can be empty.
>>>> I'm +1 for the returned iterable to be<= messages.size()
>>>>
>>>>
>>>> On Tue, Jan 10, 2012 at 9:48 PM, Sebastian Schelter<s...@apache.org>
>>>> wrote:
>>>>> I think we should make the combiner return a list/iterable that can
>>>>> potentially be empty. However we should assume that the number of
>>>>> elements returned is smaller than or equal to the number of input
>>>>> elements (whats the use of a combiner if this is not given?). I also
>>>>> concur that the code should not depend on the combiner being applied
>>>>> (similar to the way combiners work in hadoop).
>>>>>
>>>>> --sebastian
>>>>>
>>>>> 2012/1/10 Jakob Homan<jgho...@gmail.com>:
>>>>>> A composite object would essentially be a wrapper around a list and
>>>>>> introduce the need for all vertices to be ready to extract that list
>>>>>> at all times.  For instance, a combiner passed 10 messages may be able
>>>>>> to combine 7 of them but do nothing with the other three, leaving four
>>>>>> messages.  If we allow zero or one return elements, the combiner would
>>>>>> have to create a composite object with a list of those four messages,
>>>>>> whereas if we return a list, it just skips that step and returns the
>>>>>> four messages.  Additionally, the receiving vertex would have to
>>>>>> handle the possibility of a composite object every time even though
>>>>>> the combiner may or may not have been run during the superstep, or
>>>>>> even included in that job (since combiners are optional to the job
>>>>>> itself).  It would be better if one could write a Giraph application
>>>>>> that was completely agnostic of whether or not a combiner was
>>>>>> included.
>>>>>>
>>>>>> On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
>>>>>> <claudio.marte...@gmail.com>  wrote:
>>>>>>> I believe the argument of not letting users shoot their foot doesn't
>>>>>>> stand :) Once you give them any API they have the power to do anything
>>>>>>> wrong, as they already can with Giraph (or anything else for what it
>>>>>>> matters), by designing an algorithm wrongly (which would be what it
>>>>>>> would turn out to be a wrong combiner). It's definitely true that a
>>>>>>> composite object would make the grouping (List<Group>) but I thought
>>>>>>> we were talking about simplifying life to users :). I think it would
>>>>>>> be more flexible (for the present and for the future) and also more
>>>>>>> elegant,  but not necessarily a must (although it'd come practically
>>>>>>> for free).
>>>>>>>
>>>>>>> Very cool discussion.
>>>>>>>
>>>>>>> On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan<jgho...@gmail.com>
>>>>>>> wrote:
>>>>>>>>> Combiners can only modify the messages sent to a single vertex,
>>>>>>>>> so they can't send messages to other vertices.
>>>>>>>> Yeah, the more I've thought about this, the more problematic it would
>>>>>>>> be.  These new messages may be generated upon arrival at the
>>>>>>>> destination vertex (since combiners can be run on the receiving
>>>>>>>> vertex
>>>>>>>> before processing as well).  When would they be forwarded to their
>>>>>>>> new
>>>>>>>> destinations at that point?  It would be possible to get into a
>>>>>>>> feedback loop of messages jumping around before a superstep could
>>>>>>>> ever
>>>>>>>> actually be done.
>>>>>>>>
>>>>>>>> That being said, our inability to think of a good application doesn't
>>>>>>>> mean there won't be one in the future, and it's probably better to be
>>>>>>>> more flexible than try to impose what appears optimal now.  The
>>>>>>>> benefit of forcing 0 or 1 message from a combiner seems less than the
>>>>>>>> flexibility of allowing another list of messages (which may or may
>>>>>>>> not
>>>>>>>> be the same number of elements as the original, less than, or even
>>>>>>>> more than).
>>>>>>>>
>>>>>>>>> Good discussion (it's making me really think about this)!
>>>>>>>> Agreed.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching<ach...@apache.org>
>>>>>>>> wrote:
>>>>>>>>> The general idea of combiners is to reduce the number of messages
>>>>>>>>> sent.
>>>>>>>>>   Combiners are purely an optimization and the application should
>>>>>>>>> work
>>>>>>>>> correctly without it (since it's never guaranteed to actually be
>>>>>>>>> called).
>>>>>>>>>   Combiners can only modify the messages sent to a single vertex,
>>>>>>>>> so they
>>>>>>>>> can't send messages to other vertices.  Any other work (i.e. sending
>>>>>>>>> messages) should be done by the vertex in the compute() method.
>>>>>>>>>
>>>>>>>>> While I think that grouping behavior could actually be
>>>>>>>>> implemented within a
>>>>>>>>> message object (still reducing the number of messages to 1 or 0)
>>>>>>>>> I suppose
>>>>>>>>> that in some simple cases (i.e. grouping), it might be easier by
>>>>>>>>> doing it in
>>>>>>>>> the combiner as you both have mentioned?  The only thing I
>>>>>>>>> suppose I'm
>>>>>>>>> concerned about is letting users do something that is not optimal.
>>>>>>>>>   Generally, expanding messages is not what you want your
>>>>>>>>> combiner to do.
>>>>>>>>>   Also, since grouping behavior can be implemented in the message
>>>>>>>>> object, it
>>>>>>>>> forces users to avoid shooting themselves in the foot.
>>>>>>>>>
>>>>>>>>> Good discussion (it's making me really think about this)!
>>>>>>>>>
>>>>>>>>> Avery
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>>>>>>>>> Ok, now i see where you're going. I guess that the thing here is
>>>>>>>>>> that
>>>>>>>>>> the combiner would "act" like (on its behalf) D, and to do so
>>>>>>>>>> concretely it would probably need some local data related to D
>>>>>>>>>> (edges
>>>>>>>>>> values? vertexvalue?).
>>>>>>>>>> I also think that k>    n is also possible in principle and we
>>>>>>>>>> could let
>>>>>>>>>> the user decide whether to use this power or not, once/if we agree
>>>>>>>>>> that letting the user send k messages in the combiner is useful
>>>>>>>>>> (and
>>>>>>>>>> the grouping behavior shown by the label propagation example
>>>>>>>>>> should do
>>>>>>>>>> so).
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob
>>>>>>>>>> Homan<jgho...@gmail.com>    wrote:
>>>>>>>>>>> Those two messages would have gone to D, been expanded to, say, 4,
>>>>>>>>>>> which would have then then been sent to, say, M.  This would
>>>>>>>>>>> save the
>>>>>>>>>>> sending of the two to D and send the 4 directly to M.  I'm not
>>>>>>>>>>> saying
>>>>>>>>>>> it's a great example, but it is legal.  This is of course assuming
>>>>>>>>>>> that combiners can generate messages bound for vertices other
>>>>>>>>>>> than the
>>>>>>>>>>> original destination, which I don't know if that has even been
>>>>>>>>>>> discussed.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>>>>>>>>>> <claudio.marte...@gmail.com>    wrote:
>>>>>>>>>>>> i'm not sure i understand what you'd save here. if the two
>>>>>>>>>>>> messages
>>>>>>>>>>>> were going to be expanded to k messages on the destination
>>>>>>>>>>>> worker D,
>>>>>>>>>>>> but you expand them on W, you end up sending k messages
>>>>>>>>>>>> instead of 2.
>>>>>>>>>>>> right?
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob
>>>>>>>>>>>> Homan<jgho...@gmail.com>    wrote:
>>>>>>>>>>>>>> it doesn't have to be expand, k, the number of elements
>>>>>>>>>>>>>> returned by
>>>>>>>>>>>>>> the combiner, can still be smaller than n,
>>>>>>>>>>>>> Right.  Grouping would be the most common case.  It would be
>>>>>>>>>>>>> possible
>>>>>>>>>>>>> to be great than k, as well.  For instance, consider two
>>>>>>>>>>>>> messages,
>>>>>>>>>>>>> both generated on the same worker (W) by two two different
>>>>>>>>>>>>> vertices,
>>>>>>>>>>>>> both bound for another vertex, Z.  A combiner on W could get
>>>>>>>>>>>>> both of
>>>>>>>>>>>>> these messages, do some work on them, as it would have
>>>>>>>>>>>>> knowledge of
>>>>>>>>>>>>> both, and generate some arbitrary number of messages bound
>>>>>>>>>>>>> for other
>>>>>>>>>>>>> vertices (thus saving the shuffle/transfer of the original
>>>>>>>>>>>>> messages).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>>>>>>>>>>> <claudio.marte...@gmail.com>    wrote:
>>>>>>>>>>>>>> it doesn't have to be expand, k, the number of elements
>>>>>>>>>>>>>> returned by
>>>>>>>>>>>>>> the combiner, can still be smaller than n, the size of the
>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>> parameter. as a first example, you can imagine your vertex
>>>>>>>>>>>>>> receiving
>>>>>>>>>>>>>> semantically-different classes/types of messages, and you
>>>>>>>>>>>>>> can imagine
>>>>>>>>>>>>>> willing to be summarizing them in different messages, i.e.
>>>>>>>>>>>>>> if your
>>>>>>>>>>>>>> messages come along with labels or just simply by the source
>>>>>>>>>>>>>> vertex,
>>>>>>>>>>>>>> if required by the algorithm, think of label propagation to
>>>>>>>>>>>>>> have just
>>>>>>>>>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<ach...@apache.org>
>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>> I agree that C&A doesn't require it, however, I can't think
>>>>>>>>>>>>>>> of why I
>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>> want to use a combiner to expand the number of messages.
>>>>>>>>>>>>>>> Can you?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>>>>>>>> In my opinion that means reducing to a single message or
>>>>>>>>>>>>>>>>> none at
>>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>>> C&A doesn't require this, however.  Hadoop's combiner
>>>>>>>>>>>>>>>> interface, for
>>>>>>>>>>>>>>>> instance, doesn't require a single  or no value to be
>>>>>>>>>>>>>>>> returned; it
>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>> the same interface as a reducer, zero or more values.  Would
>>>>>>>>>>>>>>>> adapting
>>>>>>>>>>>>>>>> the semantics of Giraph's combiner to return a list of
>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>>>>>>>>>>> <claudio.marte...@gmail.com>      wrote:
>>>>>>>>>>>>>>>>> Yes, what is you say is completely reasonable, you
>>>>>>>>>>>>>>>>> convinced me :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery
>>>>>>>>>>>>>>>>> Ching<ach...@apache.org>
>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>> Combiners should be commutative and associative.  In my
>>>>>>>>>>>>>>>>>> opinion
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>>> reducing to a single message or none at all.  Can you
>>>>>>>>>>>>>>>>>> think of a
>>>>>>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>> more than 1 message should be returned from a combiner?
>>>>>>>>>>>>>>>>>> I know
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> returning null isn't preferable in general, but I think
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> functionality
>>>>>>>>>>>>>>>>>> (returning no messages), is nice to have and isn't a
>>>>>>>>>>>>>>>>>> huge amount
>>>>>>>>>>>>>>>>>> of work
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> our side.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>> To clarify, I was not discussing the possibility for
>>>>>>>>>>>>>>>>>>> combine to
>>>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>>> null. I see why it would be useful, given that combine
>>>>>>>>>>>>>>>>>>> returns M,
>>>>>>>>>>>>>>>>>>> there's no other way to let combiner ask not to send
>>>>>>>>>>>>>>>>>>> any message,
>>>>>>>>>>>>>>>>>>> although i agree with Jakob, I also believe returning
>>>>>>>>>>>>>>>>>>> null should
>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here,
>>>>>>>>>>>>>>>>>>> let
>>>>>>>>>>>>>>>>>>> combine()
>>>>>>>>>>>>>>>>>>> return Iterable<M>, basicallly letting it define what
>>>>>>>>>>>>>>>>>>> to combine
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension
>>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>>> model,
>>>>>>>>>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> As far as the size of the messages parameter, I agree
>>>>>>>>>>>>>>>>>>> with you
>>>>>>>>>>>>>>>>>>> that 0
>>>>>>>>>>>>>>>>>>> messages gives nothing to combine and it would be somehow
>>>>>>>>>>>>>>>>>>> awkward, it
>>>>>>>>>>>>>>>>>>> was more a matter of synching it with the other methods
>>>>>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> messages parameter.
>>>>>>>>>>>>>>>>>>> Probably, having a more clear javadoc will do the job
>>>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob
>>>>>>>>>>>>>>>>>>> Homan<jgho...@gmail.com>
>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>> I'm not a big fan of returning null as it adds extra
>>>>>>>>>>>>>>>>>>>> complexity
>>>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>>>> calling code (null checks, or not, since people
>>>>>>>>>>>>>>>>>>>> usually will
>>>>>>>>>>>>>>>>>>>> forget
>>>>>>>>>>>>>>>>>>>> them).  Avery is correct that combiners are application
>>>>>>>>>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>>>>>>>>>> it conceivable that one would want to write a combiner
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> returned
>>>>>>>>>>>>>>>>>>>> something for an input of no parameters, ie combining
>>>>>>>>>>>>>>>>>>>> the empty
>>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>>> doesn't return the empty list?  I imagine for most
>>>>>>>>>>>>>>>>>>>> combiners,
>>>>>>>>>>>>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery
>>>>>>>>>>>>>>>>>>>> Ching<ach...@apache.org>
>>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   /**
>>>>>>>>>>>>>>>>>>>>>    * Combines message values for a particular vertex
>>>>>>>>>>>>>>>>>>>>> index.
>>>>>>>>>>>>>>>>>>>>>    *
>>>>>>>>>>>>>>>>>>>>>    * @param vertexIndex Index of the vertex getting
>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>>>>>    * @param msgList List of the messages to be combined
>>>>>>>>>>>>>>>>>>>>>    * @return Message that is combined from {@link
>>>>>>>>>>>>>>>>>>>>> MsgList} or
>>>>>>>>>>>>>>>>>>>>> null if
>>>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>    *         message it to be sent
>>>>>>>>>>>>>>>>>>>>>    * @throws IOException
>>>>>>>>>>>>>>>>>>>>>    */
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I think we are somewhat vague on what a combiner can
>>>>>>>>>>>>>>>>>>>>> return to
>>>>>>>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>>>>>>>> various use cases.  A combiner should be particular to a
>>>>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>>>> compute() algorithm.  I think it should be legal to
>>>>>>>>>>>>>>>>>>>>> return null
>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> combiner, in that case, no message should be sent to
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It seems like it would be an overhead to call a
>>>>>>>>>>>>>>>>>>>>> combiner when
>>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>>>> messages.  I can't see a case where that would be
>>>>>>>>>>>>>>>>>>>>> useful.
>>>>>>>>>>>>>>>>>>>>>   Perhaps we
>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>> change the javadoc to insure that msgList must
>>>>>>>>>>>>>>>>>>>>> contain at least
>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>>>>>>>>>>>>> Fixing my test was not the issue, my question was
>>>>>>>>>>>>>>>>>>>>>> whether we
>>>>>>>>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>>>>>>>>>>>>> Personally, I believe the combiner should be ready
>>>>>>>>>>>>>>>>>>>>>> to receive
>>>>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>>>>> messages, as it's the case of
>>>>>>>>>>>>>>>>>>>>>> BasicVertex::initialize(),
>>>>>>>>>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> In the particular example, I believe the
>>>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner is
>>>>>>>>>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>>>>>>>>>> It's true that the sum of no values is 0, but it's
>>>>>>>>>>>>>>>>>>>>>> also true
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> null return semantics of combine() is more suitable
>>>>>>>>>>>>>>>>>>>>>> for this
>>>>>>>>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian
>>>>>>>>>>>>>>>>>>>>>> Schelter<s...@apache.org>
>>>>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>>>>> I think we currently implicitly assume that there
>>>>>>>>>>>>>>>>>>>>>>> is at least
>>>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>>> element in the Iterable passed to the combiner. The
>>>>>>>>>>>>>>>>>>>>>>> messaging
>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>> invokes the combiner only if at least one message
>>>>>>>>>>>>>>>>>>>>>>> for the
>>>>>>>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> However, we should not rely on implicit implementation
>>>>>>>>>>>>>>>>>>>>>>> details but
>>>>>>>>>>>>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages
>>>>>>>>>>>>>>>>>>>>>>>> and hit an
>>>>>>>>>>>>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>>>>>>>>>>>>> currently, my code fails testBspCombiner for the
>>>>>>>>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>>>>>>>> reason:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even
>>>>>>>>>>>>>>>>>>>>>>>> if there
>>>>>>>>>>>>>>>>>>>>>>>> are no
>>>>>>>>>>>>>>>>>>>>>>>> messages in the iterator (in this case it returns
>>>>>>>>>>>>>>>>>>>>>>>> 0) and for
>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the
>>>>>>>>>>>>>>>>>>>>>>>> combiner for
>>>>>>>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>      public Iterable<M>          getMessages(I
>>>>>>>>>>>>>>>>>>>>>>>> vertexId) {
>>>>>>>>>>>>>>>>>>>>>>>>        Iterable<M>          messages =
>>>>>>>>>>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>>>>>>>>>        if (combiner != null) {
>>>>>>>>>>>>>>>>>>>>>>>>                M combinedMsg;
>>>>>>>>>>>>>>>>>>>>>>>>                try {
>>>>>>>>>>>>>>>>>>>>>>>>                        combinedMsg =
>>>>>>>>>>>>>>>>>>>>>>>> combiner.combine(vertexId,
>>>>>>>>>>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>>>>>>>>>>                }  catch (IOException e) {
>>>>>>>>>>>>>>>>>>>>>>>>                        throw new
>>>>>>>>>>>>>>>>>>>>>>>> RuntimeException("could not
>>>>>>>>>>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>>>>>>>>>>                if (combinedMsg != null) {
>>>>>>>>>>>>>>>>>>>>>>>>                        List<M>          tmp = new
>>>>>>>>>>>>>>>>>>>>>>>> ArrayList<M>(1);
>>>>>>>>>>>>>>>>>>>>>>>>                        tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>>>>>>>>>                        messages = tmp;
>>>>>>>>>>>>>>>>>>>>>>>>                } else {
>>>>>>>>>>>>>>>>>>>>>>>>                        messages = new
>>>>>>>>>>>>>>>>>>>>>>>> ArrayList<M>(0);
>>>>>>>>>>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>>>>>>>>>>>        return messages;
>>>>>>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner
>>>>>>>>>>>>>>>>>>>>>>>> code that
>>>>>>>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>> sum of 0 over no values, or the framework that
>>>>>>>>>>>>>>>>>>>>>>>> calls the
>>>>>>>>>>>>>>>>>>>>>>>> combiner
>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>>>>>>     claudio.marte...@gmail.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>>>     claudio.marte...@gmail.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>     claudio.marte...@gmail.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>     Claudio Martella
>>>>>>>     claudio.marte...@gmail.com
>>>>
>>>>
>>>
>>



-- 
   Claudio Martella
   claudio.marte...@gmail.com

Reply via email to