Hi Avery,

thanks for your feedback. I know that users can decide to drop this
behavior, but this doesn't mean that those three points don't hold, to

On Fri, Jan 13, 2012 at 8:35 AM, Avery Ching <ach...@apache.org> wrote:
> Claudio,
> You are right that vertices are created automatically when messages are sent
> to non-existent vertices.  But that behavior can be made application
> specific.  The default resolution of mutations/messages is VertexResolver.
>  But you are always welcome to implement your own application specific
> behavior.  For instance, you might just want to drop the message.  If there
> is a simultaneous create/delete, you may want to always create.  You have
> the power to implement any behavior you want by setting the vertex resolver
> (see GiraphJob#setVertexResolverClass()).
> Hope this helps,
> Avery
> On 1/12/12 3:42 PM, Claudio Martella wrote:
>> Hello Giraphers,
>> I have a few comments about the current design of Giraph regarding the
>> implicit creation of vertices.
>> As it's currently designed, if you send a message to a non-existent
>> vertices, Giraph creates it for you.
>> Although I can understand it can get handy as it allows for lazy
>> dataset creation, I think it comes at some cost and I believe this
>> cost is bigger than the advantage:
>> 1) it overlaps the mutation API, where a vertex can be created
>> explicitly when the semantics of the algorithm require it, with
>> knowledge about what's going on and with explicit state. This is an
>> ambiguous and unclear part of the API which is difficult for me to
>> justify and probably confusing for the user too. Which brings me to
>> the second point.
>> 2) it requires a different, and partially duplicate,code path for
>> mutations and implicit vertex creation in our code, as it's clear by
>> looking at BasicRPCCommunication and as it's been experienced
>> currently by me in the email I recently sent to the list. Which brings
>> me to the third point.
>> 3) in order to manage this, for every message we have to hit, sooner
>> or later, the Worker vertices set to see if the vertex is existing and
>> whether it should be implicitly created. This is computationally
>> expensive both if you have a HashMap but also if you have a TreeMap
>> for range partitioning. Also, if we're going to create more exotic
>> partitioning (topology-partitioning?), we're going to hit the problem
>> more.
>> In general, I don't know any graph API that doesn't require to either
>> list explicitly the vertex set at load or to create the vertex
>> explicitly through API. As I said, I understand it allows for lazy
>> creation of the input file, with possibly missing vertices explicitly
>> enlisted (missing as a source vertex but existing as an endpoint for
>> an edge), but this could be really fixed robustly by a single
>> MapReduce job.
>> What do you guys think?

   Claudio Martella

Reply via email to