I think this is good, but really think a warning is enough, rather
than an exception.  There's no reason to pre-emptively

On Fri, Jan 13, 2012 at 1:03 PM, Sebastian Schelter <s...@apache.org> wrote:
> +1 on Iterable <= messages.size() also from me.
>
>
> On 13.01.2012 19:51, Avery Ching wrote:
>> +1
>>
>> I'm fine with this.  If we agree to return an Iterable, then we should
>> make sure to either throw if the size of the Iterable > messages.size()
>> to at the very least LOG.warn("This combiner is likely to be implemented
>> wrong").  I prefer an exception, since we have no use case for expanding
>> the set of messages.
>>
>> Also, I'd like to have something in the javadoc saying something like
>> "While the number of messages returned can be equal to the same number
>> of messages that was inputted, the purpose of the combiner is to reduced
>> the number of messages from the input."
>>
>> Avery
>>
>> On 1/13/12 9:34 AM, Claudio Martella wrote:
>>> Ok,
>>>
>>> I guess we can vote then about this, what do you think?
>>> Shall we take 72h?
>>>
>>> I'm +1 for returning an iterable that can be empty.
>>> I'm +1 for the returned iterable to be<= messages.size()
>>>
>>>
>>> On Tue, Jan 10, 2012 at 9:48 PM, Sebastian Schelter<s...@apache.org>
>>> wrote:
>>>> I think we should make the combiner return a list/iterable that can
>>>> potentially be empty. However we should assume that the number of
>>>> elements returned is smaller than or equal to the number of input
>>>> elements (whats the use of a combiner if this is not given?). I also
>>>> concur that the code should not depend on the combiner being applied
>>>> (similar to the way combiners work in hadoop).
>>>>
>>>> --sebastian
>>>>
>>>> 2012/1/10 Jakob Homan<jgho...@gmail.com>:
>>>>> A composite object would essentially be a wrapper around a list and
>>>>> introduce the need for all vertices to be ready to extract that list
>>>>> at all times.  For instance, a combiner passed 10 messages may be able
>>>>> to combine 7 of them but do nothing with the other three, leaving four
>>>>> messages.  If we allow zero or one return elements, the combiner would
>>>>> have to create a composite object with a list of those four messages,
>>>>> whereas if we return a list, it just skips that step and returns the
>>>>> four messages.  Additionally, the receiving vertex would have to
>>>>> handle the possibility of a composite object every time even though
>>>>> the combiner may or may not have been run during the superstep, or
>>>>> even included in that job (since combiners are optional to the job
>>>>> itself).  It would be better if one could write a Giraph application
>>>>> that was completely agnostic of whether or not a combiner was
>>>>> included.
>>>>>
>>>>> On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
>>>>> <claudio.marte...@gmail.com>  wrote:
>>>>>> I believe the argument of not letting users shoot their foot doesn't
>>>>>> stand :) Once you give them any API they have the power to do anything
>>>>>> wrong, as they already can with Giraph (or anything else for what it
>>>>>> matters), by designing an algorithm wrongly (which would be what it
>>>>>> would turn out to be a wrong combiner). It's definitely true that a
>>>>>> composite object would make the grouping (List<Group>) but I thought
>>>>>> we were talking about simplifying life to users :). I think it would
>>>>>> be more flexible (for the present and for the future) and also more
>>>>>> elegant,  but not necessarily a must (although it'd come practically
>>>>>> for free).
>>>>>>
>>>>>> Very cool discussion.
>>>>>>
>>>>>> On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan<jgho...@gmail.com>
>>>>>> wrote:
>>>>>>>> Combiners can only modify the messages sent to a single vertex,
>>>>>>>> so they can't send messages to other vertices.
>>>>>>> Yeah, the more I've thought about this, the more problematic it would
>>>>>>> be.  These new messages may be generated upon arrival at the
>>>>>>> destination vertex (since combiners can be run on the receiving
>>>>>>> vertex
>>>>>>> before processing as well).  When would they be forwarded to their
>>>>>>> new
>>>>>>> destinations at that point?  It would be possible to get into a
>>>>>>> feedback loop of messages jumping around before a superstep could
>>>>>>> ever
>>>>>>> actually be done.
>>>>>>>
>>>>>>> That being said, our inability to think of a good application doesn't
>>>>>>> mean there won't be one in the future, and it's probably better to be
>>>>>>> more flexible than try to impose what appears optimal now.  The
>>>>>>> benefit of forcing 0 or 1 message from a combiner seems less than the
>>>>>>> flexibility of allowing another list of messages (which may or may
>>>>>>> not
>>>>>>> be the same number of elements as the original, less than, or even
>>>>>>> more than).
>>>>>>>
>>>>>>>> Good discussion (it's making me really think about this)!
>>>>>>> Agreed.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching<ach...@apache.org>
>>>>>>> wrote:
>>>>>>>> The general idea of combiners is to reduce the number of messages
>>>>>>>> sent.
>>>>>>>>   Combiners are purely an optimization and the application should
>>>>>>>> work
>>>>>>>> correctly without it (since it's never guaranteed to actually be
>>>>>>>> called).
>>>>>>>>   Combiners can only modify the messages sent to a single vertex,
>>>>>>>> so they
>>>>>>>> can't send messages to other vertices.  Any other work (i.e. sending
>>>>>>>> messages) should be done by the vertex in the compute() method.
>>>>>>>>
>>>>>>>> While I think that grouping behavior could actually be
>>>>>>>> implemented within a
>>>>>>>> message object (still reducing the number of messages to 1 or 0)
>>>>>>>> I suppose
>>>>>>>> that in some simple cases (i.e. grouping), it might be easier by
>>>>>>>> doing it in
>>>>>>>> the combiner as you both have mentioned?  The only thing I
>>>>>>>> suppose I'm
>>>>>>>> concerned about is letting users do something that is not optimal.
>>>>>>>>   Generally, expanding messages is not what you want your
>>>>>>>> combiner to do.
>>>>>>>>   Also, since grouping behavior can be implemented in the message
>>>>>>>> object, it
>>>>>>>> forces users to avoid shooting themselves in the foot.
>>>>>>>>
>>>>>>>> Good discussion (it's making me really think about this)!
>>>>>>>>
>>>>>>>> Avery
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>>>>>>>> Ok, now i see where you're going. I guess that the thing here is
>>>>>>>>> that
>>>>>>>>> the combiner would "act" like (on its behalf) D, and to do so
>>>>>>>>> concretely it would probably need some local data related to D
>>>>>>>>> (edges
>>>>>>>>> values? vertexvalue?).
>>>>>>>>> I also think that k>    n is also possible in principle and we
>>>>>>>>> could let
>>>>>>>>> the user decide whether to use this power or not, once/if we agree
>>>>>>>>> that letting the user send k messages in the combiner is useful
>>>>>>>>> (and
>>>>>>>>> the grouping behavior shown by the label propagation example
>>>>>>>>> should do
>>>>>>>>> so).
>>>>>>>>>
>>>>>>>>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob
>>>>>>>>> Homan<jgho...@gmail.com>    wrote:
>>>>>>>>>> Those two messages would have gone to D, been expanded to, say, 4,
>>>>>>>>>> which would have then then been sent to, say, M.  This would
>>>>>>>>>> save the
>>>>>>>>>> sending of the two to D and send the 4 directly to M.  I'm not
>>>>>>>>>> saying
>>>>>>>>>> it's a great example, but it is legal.  This is of course assuming
>>>>>>>>>> that combiners can generate messages bound for vertices other
>>>>>>>>>> than the
>>>>>>>>>> original destination, which I don't know if that has even been
>>>>>>>>>> discussed.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>>>>>>>>> <claudio.marte...@gmail.com>    wrote:
>>>>>>>>>>> i'm not sure i understand what you'd save here. if the two
>>>>>>>>>>> messages
>>>>>>>>>>> were going to be expanded to k messages on the destination
>>>>>>>>>>> worker D,
>>>>>>>>>>> but you expand them on W, you end up sending k messages
>>>>>>>>>>> instead of 2.
>>>>>>>>>>> right?
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob
>>>>>>>>>>> Homan<jgho...@gmail.com>    wrote:
>>>>>>>>>>>>> it doesn't have to be expand, k, the number of elements
>>>>>>>>>>>>> returned by
>>>>>>>>>>>>> the combiner, can still be smaller than n,
>>>>>>>>>>>> Right.  Grouping would be the most common case.  It would be
>>>>>>>>>>>> possible
>>>>>>>>>>>> to be great than k, as well.  For instance, consider two
>>>>>>>>>>>> messages,
>>>>>>>>>>>> both generated on the same worker (W) by two two different
>>>>>>>>>>>> vertices,
>>>>>>>>>>>> both bound for another vertex, Z.  A combiner on W could get
>>>>>>>>>>>> both of
>>>>>>>>>>>> these messages, do some work on them, as it would have
>>>>>>>>>>>> knowledge of
>>>>>>>>>>>> both, and generate some arbitrary number of messages bound
>>>>>>>>>>>> for other
>>>>>>>>>>>> vertices (thus saving the shuffle/transfer of the original
>>>>>>>>>>>> messages).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>>>>>>>>>> <claudio.marte...@gmail.com>    wrote:
>>>>>>>>>>>>> it doesn't have to be expand, k, the number of elements
>>>>>>>>>>>>> returned by
>>>>>>>>>>>>> the combiner, can still be smaller than n, the size of the
>>>>>>>>>>>>> messages
>>>>>>>>>>>>> parameter. as a first example, you can imagine your vertex
>>>>>>>>>>>>> receiving
>>>>>>>>>>>>> semantically-different classes/types of messages, and you
>>>>>>>>>>>>> can imagine
>>>>>>>>>>>>> willing to be summarizing them in different messages, i.e.
>>>>>>>>>>>>> if your
>>>>>>>>>>>>> messages come along with labels or just simply by the source
>>>>>>>>>>>>> vertex,
>>>>>>>>>>>>> if required by the algorithm, think of label propagation to
>>>>>>>>>>>>> have just
>>>>>>>>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<ach...@apache.org>
>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>> I agree that C&A doesn't require it, however, I can't think
>>>>>>>>>>>>>> of why I
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>> want to use a combiner to expand the number of messages.
>>>>>>>>>>>>>> Can you?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>>>>>>> In my opinion that means reducing to a single message or
>>>>>>>>>>>>>>>> none at
>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>> C&A doesn't require this, however.  Hadoop's combiner
>>>>>>>>>>>>>>> interface, for
>>>>>>>>>>>>>>> instance, doesn't require a single  or no value to be
>>>>>>>>>>>>>>> returned; it
>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>> the same interface as a reducer, zero or more values.  Would
>>>>>>>>>>>>>>> adapting
>>>>>>>>>>>>>>> the semantics of Giraph's combiner to return a list of
>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>>>>>>>>>> <claudio.marte...@gmail.com>      wrote:
>>>>>>>>>>>>>>>> Yes, what is you say is completely reasonable, you
>>>>>>>>>>>>>>>> convinced me :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery
>>>>>>>>>>>>>>>> Ching<ach...@apache.org>
>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>> Combiners should be commutative and associative.  In my
>>>>>>>>>>>>>>>>> opinion
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>> reducing to a single message or none at all.  Can you
>>>>>>>>>>>>>>>>> think of a
>>>>>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>> more than 1 message should be returned from a combiner?
>>>>>>>>>>>>>>>>> I know
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> returning null isn't preferable in general, but I think
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> functionality
>>>>>>>>>>>>>>>>> (returning no messages), is nice to have and isn't a
>>>>>>>>>>>>>>>>> huge amount
>>>>>>>>>>>>>>>>> of work
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> our side.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>> To clarify, I was not discussing the possibility for
>>>>>>>>>>>>>>>>>> combine to
>>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>> null. I see why it would be useful, given that combine
>>>>>>>>>>>>>>>>>> returns M,
>>>>>>>>>>>>>>>>>> there's no other way to let combiner ask not to send
>>>>>>>>>>>>>>>>>> any message,
>>>>>>>>>>>>>>>>>> although i agree with Jakob, I also believe returning
>>>>>>>>>>>>>>>>>> null should
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here,
>>>>>>>>>>>>>>>>>> let
>>>>>>>>>>>>>>>>>> combine()
>>>>>>>>>>>>>>>>>> return Iterable<M>, basicallly letting it define what
>>>>>>>>>>>>>>>>>> to combine
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension
>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>> model,
>>>>>>>>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As far as the size of the messages parameter, I agree
>>>>>>>>>>>>>>>>>> with you
>>>>>>>>>>>>>>>>>> that 0
>>>>>>>>>>>>>>>>>> messages gives nothing to combine and it would be somehow
>>>>>>>>>>>>>>>>>> awkward, it
>>>>>>>>>>>>>>>>>> was more a matter of synching it with the other methods
>>>>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> messages parameter.
>>>>>>>>>>>>>>>>>> Probably, having a more clear javadoc will do the job
>>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob
>>>>>>>>>>>>>>>>>> Homan<jgho...@gmail.com>
>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>> I'm not a big fan of returning null as it adds extra
>>>>>>>>>>>>>>>>>>> complexity
>>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>>> calling code (null checks, or not, since people
>>>>>>>>>>>>>>>>>>> usually will
>>>>>>>>>>>>>>>>>>> forget
>>>>>>>>>>>>>>>>>>> them).  Avery is correct that combiners are application
>>>>>>>>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>>>>>>>>> it conceivable that one would want to write a combiner
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> returned
>>>>>>>>>>>>>>>>>>> something for an input of no parameters, ie combining
>>>>>>>>>>>>>>>>>>> the empty
>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>> doesn't return the empty list?  I imagine for most
>>>>>>>>>>>>>>>>>>> combiners,
>>>>>>>>>>>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery
>>>>>>>>>>>>>>>>>>> Ching<ach...@apache.org>
>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   /**
>>>>>>>>>>>>>>>>>>>>    * Combines message values for a particular vertex
>>>>>>>>>>>>>>>>>>>> index.
>>>>>>>>>>>>>>>>>>>>    *
>>>>>>>>>>>>>>>>>>>>    * @param vertexIndex Index of the vertex getting
>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>>>>    * @param msgList List of the messages to be combined
>>>>>>>>>>>>>>>>>>>>    * @return Message that is combined from {@link
>>>>>>>>>>>>>>>>>>>> MsgList} or
>>>>>>>>>>>>>>>>>>>> null if
>>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>    *         message it to be sent
>>>>>>>>>>>>>>>>>>>>    * @throws IOException
>>>>>>>>>>>>>>>>>>>>    */
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I think we are somewhat vague on what a combiner can
>>>>>>>>>>>>>>>>>>>> return to
>>>>>>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>>>>>>> various use cases.  A combiner should be particular to a
>>>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>>> compute() algorithm.  I think it should be legal to
>>>>>>>>>>>>>>>>>>>> return null
>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> combiner, in that case, no message should be sent to
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It seems like it would be an overhead to call a
>>>>>>>>>>>>>>>>>>>> combiner when
>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>>> messages.  I can't see a case where that would be
>>>>>>>>>>>>>>>>>>>> useful.
>>>>>>>>>>>>>>>>>>>>   Perhaps we
>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>> change the javadoc to insure that msgList must
>>>>>>>>>>>>>>>>>>>> contain at least
>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>>>>>>>>>>>> Fixing my test was not the issue, my question was
>>>>>>>>>>>>>>>>>>>>> whether we
>>>>>>>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>>>>>>>>>>>> Personally, I believe the combiner should be ready
>>>>>>>>>>>>>>>>>>>>> to receive
>>>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>>>> messages, as it's the case of
>>>>>>>>>>>>>>>>>>>>> BasicVertex::initialize(),
>>>>>>>>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> In the particular example, I believe the
>>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner is
>>>>>>>>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>>>>>>>>> It's true that the sum of no values is 0, but it's
>>>>>>>>>>>>>>>>>>>>> also true
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> null return semantics of combine() is more suitable
>>>>>>>>>>>>>>>>>>>>> for this
>>>>>>>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian
>>>>>>>>>>>>>>>>>>>>> Schelter<s...@apache.org>
>>>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>>>> I think we currently implicitly assume that there
>>>>>>>>>>>>>>>>>>>>>> is at least
>>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>> element in the Iterable passed to the combiner. The
>>>>>>>>>>>>>>>>>>>>>> messaging
>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>> invokes the combiner only if at least one message
>>>>>>>>>>>>>>>>>>>>>> for the
>>>>>>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> However, we should not rely on implicit implementation
>>>>>>>>>>>>>>>>>>>>>> details but
>>>>>>>>>>>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages
>>>>>>>>>>>>>>>>>>>>>>> and hit an
>>>>>>>>>>>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>>>>>>>>>>>> currently, my code fails testBspCombiner for the
>>>>>>>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>>>>>>> reason:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even
>>>>>>>>>>>>>>>>>>>>>>> if there
>>>>>>>>>>>>>>>>>>>>>>> are no
>>>>>>>>>>>>>>>>>>>>>>> messages in the iterator (in this case it returns
>>>>>>>>>>>>>>>>>>>>>>> 0) and for
>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the
>>>>>>>>>>>>>>>>>>>>>>> combiner for
>>>>>>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>      public Iterable<M>          getMessages(I
>>>>>>>>>>>>>>>>>>>>>>> vertexId) {
>>>>>>>>>>>>>>>>>>>>>>>        Iterable<M>          messages =
>>>>>>>>>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>>>>>>>>        if (combiner != null) {
>>>>>>>>>>>>>>>>>>>>>>>                M combinedMsg;
>>>>>>>>>>>>>>>>>>>>>>>                try {
>>>>>>>>>>>>>>>>>>>>>>>                        combinedMsg =
>>>>>>>>>>>>>>>>>>>>>>> combiner.combine(vertexId,
>>>>>>>>>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>>>>>>>>>                }  catch (IOException e) {
>>>>>>>>>>>>>>>>>>>>>>>                        throw new
>>>>>>>>>>>>>>>>>>>>>>> RuntimeException("could not
>>>>>>>>>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>>>>>>>>>                if (combinedMsg != null) {
>>>>>>>>>>>>>>>>>>>>>>>                        List<M>          tmp = new
>>>>>>>>>>>>>>>>>>>>>>> ArrayList<M>(1);
>>>>>>>>>>>>>>>>>>>>>>>                        tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>>>>>>>>                        messages = tmp;
>>>>>>>>>>>>>>>>>>>>>>>                } else {
>>>>>>>>>>>>>>>>>>>>>>>                        messages = new
>>>>>>>>>>>>>>>>>>>>>>> ArrayList<M>(0);
>>>>>>>>>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>>>>>>>>>>        return messages;
>>>>>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner
>>>>>>>>>>>>>>>>>>>>>>> code that
>>>>>>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>> sum of 0 over no values, or the framework that
>>>>>>>>>>>>>>>>>>>>>>> calls the
>>>>>>>>>>>>>>>>>>>>>>> combiner
>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>>>>>     claudio.marte...@gmail.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>>     claudio.marte...@gmail.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>     claudio.marte...@gmail.com
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>     Claudio Martella
>>>>>>     claudio.marte...@gmail.com
>>>
>>>
>>
>

Reply via email to