Inline responses.

Happy Friday,


On 1/13/12 10:51 AM, Claudio Martella wrote:
Hi Avery,

thanks for your feedback.

I'm advocating for allowing mutations only through Mutable interface
methods. I agree that it can come handy to have the implicit vertex
creation, for the reason you mentioned (which I called lazy inputset
creation), but you can obtain the same through a simple  single M/R
job run in advance.
I think this is pretty expensive (extra MR job). Users do have this option, but I doubt many would take it when they don't have to.

What we win back is that we don't have the
computational cost, and code complexity of checking if the vertex
exists already for each message we get.
Checking if the vertex exists is pretty cheap in a hashmap (constant time). We should verify that this is a computational overhead (maybe some profiling) before optimizing it. I suppose we could add a switch to bypass any graph mutation in general.
You know what I mean?

On Fri, Jan 13, 2012 at 7:44 PM, Avery Ching<>  wrote:

What are you advocating in particular?

Graph mutation should be allowed (i.e. adding vertices).  We allow this to
happen through the addVertexReq() interface and through the VertexResolver
implementation (say for messages to non-existent vertices).  I can see why
this would be useful.  Imagine you are computing page rank on the web graph,
but you only have a subset of the sites, but all the outlinks for each site.
  It is nice to be able to allow new vertices (sites) while running the

I agree that the way that vertices are created and initialized is a bit
vague.  We can work on improving the interfaces if anyone has suggestions.


On 1/13/12 12:10 AM, Claudio Martella wrote:
Hi Avery,

thanks for your feedback. I know that users can decide to drop this
behavior, but this doesn't mean that those three points don't hold, to

On Fri, Jan 13, 2012 at 8:35 AM, Avery Ching<>    wrote:

You are right that vertices are created automatically when messages are
to non-existent vertices.  But that behavior can be made application
specific.  The default resolution of mutations/messages is
  But you are always welcome to implement your own application specific
behavior.  For instance, you might just want to drop the message.  If
is a simultaneous create/delete, you may want to always create.  You have
the power to implement any behavior you want by setting the vertex
(see GiraphJob#setVertexResolverClass()).

Hope this helps,


On 1/12/12 3:42 PM, Claudio Martella wrote:
Hello Giraphers,

I have a few comments about the current design of Giraph regarding the
implicit creation of vertices.
As it's currently designed, if you send a message to a non-existent
vertices, Giraph creates it for you.
Although I can understand it can get handy as it allows for lazy
dataset creation, I think it comes at some cost and I believe this
cost is bigger than the advantage:

1) it overlaps the mutation API, where a vertex can be created
explicitly when the semantics of the algorithm require it, with
knowledge about what's going on and with explicit state. This is an
ambiguous and unclear part of the API which is difficult for me to
justify and probably confusing for the user too. Which brings me to
the second point.

2) it requires a different, and partially duplicate,code path for
mutations and implicit vertex creation in our code, as it's clear by
looking at BasicRPCCommunication and as it's been experienced
currently by me in the email I recently sent to the list. Which brings
me to the third point.

3) in order to manage this, for every message we have to hit, sooner
or later, the Worker vertices set to see if the vertex is existing and
whether it should be implicitly created. This is computationally
expensive both if you have a HashMap but also if you have a TreeMap
for range partitioning. Also, if we're going to create more exotic
partitioning (topology-partitioning?), we're going to hit the problem

In general, I don't know any graph API that doesn't require to either
list explicitly the vertex set at load or to create the vertex
explicitly through API. As I said, I understand it allows for lazy
creation of the input file, with possibly missing vertices explicitly
enlisted (missing as a source vertex but existing as an endpoint for
an edge), but this could be really fixed robustly by a single
MapReduce job.

What do you guys think?

Reply via email to