Ok,

I guess we can vote then about this, what do you think?
Shall we take 72h?

I'm +1 for returning an iterable that can be empty.
I'm +1 for the returned iterable to be <= messages.size()


On Tue, Jan 10, 2012 at 9:48 PM, Sebastian Schelter <s...@apache.org> wrote:
> I think we should make the combiner return a list/iterable that can
> potentially be empty. However we should assume that the number of
> elements returned is smaller than or equal to the number of input
> elements (whats the use of a combiner if this is not given?). I also
> concur that the code should not depend on the combiner being applied
> (similar to the way combiners work in hadoop).
>
> --sebastian
>
> 2012/1/10 Jakob Homan <jgho...@gmail.com>:
>> A composite object would essentially be a wrapper around a list and
>> introduce the need for all vertices to be ready to extract that list
>> at all times.  For instance, a combiner passed 10 messages may be able
>> to combine 7 of them but do nothing with the other three, leaving four
>> messages.  If we allow zero or one return elements, the combiner would
>> have to create a composite object with a list of those four messages,
>> whereas if we return a list, it just skips that step and returns the
>> four messages.  Additionally, the receiving vertex would have to
>> handle the possibility of a composite object every time even though
>> the combiner may or may not have been run during the superstep, or
>> even included in that job (since combiners are optional to the job
>> itself).  It would be better if one could write a Giraph application
>> that was completely agnostic of whether or not a combiner was
>> included.
>>
>> On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
>> <claudio.marte...@gmail.com> wrote:
>>> I believe the argument of not letting users shoot their foot doesn't
>>> stand :) Once you give them any API they have the power to do anything
>>> wrong, as they already can with Giraph (or anything else for what it
>>> matters), by designing an algorithm wrongly (which would be what it
>>> would turn out to be a wrong combiner). It's definitely true that a
>>> composite object would make the grouping (List<Group>) but I thought
>>> we were talking about simplifying life to users :). I think it would
>>> be more flexible (for the present and for the future) and also more
>>> elegant,  but not necessarily a must (although it'd come practically
>>> for free).
>>>
>>> Very cool discussion.
>>>
>>> On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan <jgho...@gmail.com> wrote:
>>>>> Combiners can only modify the messages sent to a single vertex, so they 
>>>>> can't send messages to other vertices.
>>>> Yeah, the more I've thought about this, the more problematic it would
>>>> be.  These new messages may be generated upon arrival at the
>>>> destination vertex (since combiners can be run on the receiving vertex
>>>> before processing as well).  When would they be forwarded to their new
>>>> destinations at that point?  It would be possible to get into a
>>>> feedback loop of messages jumping around before a superstep could ever
>>>> actually be done.
>>>>
>>>> That being said, our inability to think of a good application doesn't
>>>> mean there won't be one in the future, and it's probably better to be
>>>> more flexible than try to impose what appears optimal now.  The
>>>> benefit of forcing 0 or 1 message from a combiner seems less than the
>>>> flexibility of allowing another list of messages (which may or may not
>>>> be the same number of elements as the original, less than, or even
>>>> more than).
>>>>
>>>>>Good discussion (it's making me really think about this)!
>>>> Agreed.
>>>>
>>>>
>>>> On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching <ach...@apache.org> wrote:
>>>>> The general idea of combiners is to reduce the number of messages sent.
>>>>>  Combiners are purely an optimization and the application should work
>>>>> correctly without it (since it's never guaranteed to actually be called).
>>>>>  Combiners can only modify the messages sent to a single vertex, so they
>>>>> can't send messages to other vertices.  Any other work (i.e. sending
>>>>> messages) should be done by the vertex in the compute() method.
>>>>>
>>>>> While I think that grouping behavior could actually be implemented within 
>>>>> a
>>>>> message object (still reducing the number of messages to 1 or 0) I suppose
>>>>> that in some simple cases (i.e. grouping), it might be easier by doing it 
>>>>> in
>>>>> the combiner as you both have mentioned?  The only thing I suppose I'm
>>>>> concerned about is letting users do something that is not optimal.
>>>>>  Generally, expanding messages is not what you want your combiner to do.
>>>>>  Also, since grouping behavior can be implemented in the message object, 
>>>>> it
>>>>> forces users to avoid shooting themselves in the foot.
>>>>>
>>>>> Good discussion (it's making me really think about this)!
>>>>>
>>>>> Avery
>>>>>
>>>>>
>>>>> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>>>>>
>>>>>> Ok, now i see where you're going. I guess that the thing here is that
>>>>>> the combiner would "act" like (on its behalf) D, and to do so
>>>>>> concretely it would probably need some local data related to D (edges
>>>>>> values? vertexvalue?).
>>>>>> I also think that k>  n is also possible in principle and we could let
>>>>>> the user decide whether to use this power or not, once/if we agree
>>>>>> that letting the user send k messages in the combiner is useful (and
>>>>>> the grouping behavior shown by the label propagation example should do
>>>>>> so).
>>>>>>
>>>>>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob Homan<jgho...@gmail.com>  wrote:
>>>>>>>
>>>>>>> Those two messages would have gone to D, been expanded to, say, 4,
>>>>>>> which would have then then been sent to, say, M.  This would save the
>>>>>>> sending of the two to D and send the 4 directly to M.  I'm not saying
>>>>>>> it's a great example, but it is legal.  This is of course assuming
>>>>>>> that combiners can generate messages bound for vertices other than the
>>>>>>> original destination, which I don't know if that has even been
>>>>>>> discussed.
>>>>>>>
>>>>>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>>>>>> <claudio.marte...@gmail.com>  wrote:
>>>>>>>>
>>>>>>>> i'm not sure i understand what you'd save here. if the two messages
>>>>>>>> were going to be expanded to k messages on the destination worker D,
>>>>>>>> but you expand them on W, you end up sending k messages instead of 2.
>>>>>>>> right?
>>>>>>>>
>>>>>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan<jgho...@gmail.com>  wrote:
>>>>>>>>>>
>>>>>>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>>>>>>> the combiner, can still be smaller than n,
>>>>>>>>>
>>>>>>>>> Right.  Grouping would be the most common case.  It would be possible
>>>>>>>>> to be great than k, as well.  For instance, consider two messages,
>>>>>>>>> both generated on the same worker (W) by two two different vertices,
>>>>>>>>> both bound for another vertex, Z.  A combiner on W could get both of
>>>>>>>>> these messages, do some work on them, as it would have knowledge of
>>>>>>>>> both, and generate some arbitrary number of messages bound for other
>>>>>>>>> vertices (thus saving the shuffle/transfer of the original messages).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>>>>>>> <claudio.marte...@gmail.com>  wrote:
>>>>>>>>>>
>>>>>>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>>>>>>> the combiner, can still be smaller than n, the size of the messages
>>>>>>>>>> parameter. as a first example, you can imagine your vertex receiving
>>>>>>>>>> semantically-different classes/types of messages, and you can imagine
>>>>>>>>>> willing to be summarizing them in different messages, i.e. if your
>>>>>>>>>> messages come along with labels or just simply by the source vertex,
>>>>>>>>>> if required by the algorithm, think of label propagation to have just
>>>>>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<ach...@apache.org>
>>>>>>>>>>  wrote:
>>>>>>>>>>>
>>>>>>>>>>> I agree that C&A doesn't require it, however, I can't think of why I
>>>>>>>>>>> would
>>>>>>>>>>> want to use a combiner to expand the number of messages.  Can you?
>>>>>>>>>>>
>>>>>>>>>>> Avery
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> In my opinion that means reducing to a single message or none at
>>>>>>>>>>>>> all.
>>>>>>>>>>>>
>>>>>>>>>>>> C&A doesn't require this, however.  Hadoop's combiner interface, 
>>>>>>>>>>>> for
>>>>>>>>>>>> instance, doesn't require a single  or no value to be returned; it
>>>>>>>>>>>> has
>>>>>>>>>>>> the same interface as a reducer, zero or more values.  Would
>>>>>>>>>>>> adapting
>>>>>>>>>>>> the semantics of Giraph's combiner to return a list of messages
>>>>>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>>>>>>> <claudio.marte...@gmail.com>    wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<ach...@apache.org>
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Combiners should be commutative and associative.  In my opinion
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> means
>>>>>>>>>>>>>> reducing to a single message or none at all.  Can you think of a
>>>>>>>>>>>>>> case
>>>>>>>>>>>>>> when
>>>>>>>>>>>>>> more than 1 message should be returned from a combiner?  I know
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> returning null isn't preferable in general, but I think that
>>>>>>>>>>>>>> functionality
>>>>>>>>>>>>>> (returning no messages), is nice to have and isn't a huge amount
>>>>>>>>>>>>>> of work
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>> our side.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To clarify, I was not discussing the possibility for combine to
>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>> null. I see why it would be useful, given that combine returns 
>>>>>>>>>>>>>>> M,
>>>>>>>>>>>>>>> there's no other way to let combiner ask not to send any 
>>>>>>>>>>>>>>> message,
>>>>>>>>>>>>>>> although i agree with Jakob, I also believe returning null 
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here, let
>>>>>>>>>>>>>>> combine()
>>>>>>>>>>>>>>> return Iterable<M>, basicallly letting it define what to combine
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension to the
>>>>>>>>>>>>>>> model,
>>>>>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As far as the size of the messages parameter, I agree with you
>>>>>>>>>>>>>>> that 0
>>>>>>>>>>>>>>> messages gives nothing to combine and it would be somehow
>>>>>>>>>>>>>>> awkward, it
>>>>>>>>>>>>>>> was more a matter of synching it with the other methods getting
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> messages parameter.
>>>>>>>>>>>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jgho...@gmail.com>
>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm not a big fan of returning null as it adds extra complexity
>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>> calling code (null checks, or not, since people usually will
>>>>>>>>>>>>>>>> forget
>>>>>>>>>>>>>>>> them).  Avery is correct that combiners are application
>>>>>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>>>>>> it conceivable that one would want to write a combiner that
>>>>>>>>>>>>>>>> returned
>>>>>>>>>>>>>>>> something for an input of no parameters, ie combining the empty
>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>>>>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ach...@apache.org>
>>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  /**
>>>>>>>>>>>>>>>>>   * Combines message values for a particular vertex index.
>>>>>>>>>>>>>>>>>   *
>>>>>>>>>>>>>>>>>   * @param vertexIndex Index of the vertex getting these
>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>   * @param msgList List of the messages to be combined
>>>>>>>>>>>>>>>>>   * @return Message that is combined from {@link MsgList} or
>>>>>>>>>>>>>>>>> null if
>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>   *         message it to be sent
>>>>>>>>>>>>>>>>>   * @throws IOException
>>>>>>>>>>>>>>>>>   */
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think we are somewhat vague on what a combiner can return to
>>>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>>>> various use cases.  A combiner should be particular to a
>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>> compute() algorithm.  I think it should be legal to return 
>>>>>>>>>>>>>>>>> null
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> combiner, in that case, no message should be sent to that
>>>>>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It seems like it would be an overhead to call a combiner when
>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>> messages.  I can't see a case where that would be useful.
>>>>>>>>>>>>>>>>>  Perhaps we
>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>> change the javadoc to insure that msgList must contain at 
>>>>>>>>>>>>>>>>> least
>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>>>>>>>>> Fixing my test was not the issue, my question was whether we
>>>>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>>>>>>>>> Personally, I believe the combiner should be ready to receive
>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In the particular example, I believe the SimpleSumCombiner is
>>>>>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>>>>>> It's true that the sum of no values is 0, but it's also true
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> null return semantics of combine() is more suitable for this
>>>>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian
>>>>>>>>>>>>>>>>>> Schelter<s...@apache.org>
>>>>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think we currently implicitly assume that there is at 
>>>>>>>>>>>>>>>>>>> least
>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>> element in the Iterable passed to the combiner. The 
>>>>>>>>>>>>>>>>>>> messaging
>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>> invokes the combiner only if at least one message for the
>>>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> However, we should not rely on implicit implementation
>>>>>>>>>>>>>>>>>>> details but
>>>>>>>>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>>>>>>>>> currently, my code fails testBspCombiner for the following
>>>>>>>>>>>>>>>>>>>> reason:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there
>>>>>>>>>>>>>>>>>>>> are no
>>>>>>>>>>>>>>>>>>>> messages in the iterator (in this case it returns 0) and 
>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for
>>>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     public Iterable<M>        getMessages(I vertexId) {
>>>>>>>>>>>>>>>>>>>>       Iterable<M>        messages =
>>>>>>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>>>>>       if (combiner != null) {
>>>>>>>>>>>>>>>>>>>>               M combinedMsg;
>>>>>>>>>>>>>>>>>>>>               try {
>>>>>>>>>>>>>>>>>>>>                       combinedMsg =
>>>>>>>>>>>>>>>>>>>> combiner.combine(vertexId,
>>>>>>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>>>>>>               }  catch (IOException e) {
>>>>>>>>>>>>>>>>>>>>                       throw new RuntimeException("could not
>>>>>>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>>>>>>>               if (combinedMsg != null) {
>>>>>>>>>>>>>>>>>>>>                       List<M>        tmp = new
>>>>>>>>>>>>>>>>>>>> ArrayList<M>(1);
>>>>>>>>>>>>>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>>>>>                       messages = tmp;
>>>>>>>>>>>>>>>>>>>>               } else {
>>>>>>>>>>>>>>>>>>>>                       messages = new ArrayList<M>(0);
>>>>>>>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>       return messages;
>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner code that
>>>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> sum of 0 over no values, or the framework that calls the
>>>>>>>>>>>>>>>>>>>> combiner
>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>    Claudio Martella
>>>>>>>>>>>>>    claudio.marte...@gmail.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>    Claudio Martella
>>>>>>>>>>    claudio.marte...@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>    Claudio Martella
>>>>>>>>    claudio.marte...@gmail.com
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>>
>>>
>>> --
>>>    Claudio Martella
>>>    claudio.marte...@gmail.com



-- 
   Claudio Martella
   claudio.marte...@gmail.com

Reply via email to