What are you advocating in particular?
Graph mutation should be allowed (i.e. adding vertices). We allow this
to happen through the addVertexReq() interface and through the
VertexResolver implementation (say for messages to non-existent
vertices). I can see why this would be useful. Imagine you are
computing page rank on the web graph, but you only have a subset of the
sites, but all the outlinks for each site. It is nice to be able to
allow new vertices (sites) while running the application.
I agree that the way that vertices are created and initialized is a bit
vague. We can work on improving the interfaces if anyone has suggestions.
On 1/13/12 12:10 AM, Claudio Martella wrote:
thanks for your feedback. I know that users can decide to drop this
behavior, but this doesn't mean that those three points don't hold, to
On Fri, Jan 13, 2012 at 8:35 AM, Avery Ching<ach...@apache.org> wrote:
You are right that vertices are created automatically when messages are sent
to non-existent vertices. But that behavior can be made application
specific. The default resolution of mutations/messages is VertexResolver.
But you are always welcome to implement your own application specific
behavior. For instance, you might just want to drop the message. If there
is a simultaneous create/delete, you may want to always create. You have
the power to implement any behavior you want by setting the vertex resolver
Hope this helps,
On 1/12/12 3:42 PM, Claudio Martella wrote:
I have a few comments about the current design of Giraph regarding the
implicit creation of vertices.
As it's currently designed, if you send a message to a non-existent
vertices, Giraph creates it for you.
Although I can understand it can get handy as it allows for lazy
dataset creation, I think it comes at some cost and I believe this
cost is bigger than the advantage:
1) it overlaps the mutation API, where a vertex can be created
explicitly when the semantics of the algorithm require it, with
knowledge about what's going on and with explicit state. This is an
ambiguous and unclear part of the API which is difficult for me to
justify and probably confusing for the user too. Which brings me to
the second point.
2) it requires a different, and partially duplicate,code path for
mutations and implicit vertex creation in our code, as it's clear by
looking at BasicRPCCommunication and as it's been experienced
currently by me in the email I recently sent to the list. Which brings
me to the third point.
3) in order to manage this, for every message we have to hit, sooner
or later, the Worker vertices set to see if the vertex is existing and
whether it should be implicitly created. This is computationally
expensive both if you have a HashMap but also if you have a TreeMap
for range partitioning. Also, if we're going to create more exotic
partitioning (topology-partitioning?), we're going to hit the problem
In general, I don't know any graph API that doesn't require to either
list explicitly the vertex set at load or to create the vertex
explicitly through API. As I said, I understand it allows for lazy
creation of the input file, with possibly missing vertices explicitly
enlisted (missing as a source vertex but existing as an endpoint for
an edge), but this could be really fixed robustly by a single
What do you guys think?