thanks for your feedback.
I'm advocating for allowing mutations only through Mutable interface
methods. I agree that it can come handy to have the implicit vertex
creation, for the reason you mentioned (which I called lazy inputset
creation), but you can obtain the same through a simple single M/R
job run in advance. What we win back is that we don't have the
computational cost, and code complexity of checking if the vertex
exists already for each message we get.
You know what I mean?
On Fri, Jan 13, 2012 at 7:44 PM, Avery Ching <ach...@apache.org> wrote:
> What are you advocating in particular?
> Graph mutation should be allowed (i.e. adding vertices). We allow this to
> happen through the addVertexReq() interface and through the VertexResolver
> implementation (say for messages to non-existent vertices). I can see why
> this would be useful. Imagine you are computing page rank on the web graph,
> but you only have a subset of the sites, but all the outlinks for each site.
> It is nice to be able to allow new vertices (sites) while running the
> I agree that the way that vertices are created and initialized is a bit
> vague. We can work on improving the interfaces if anyone has suggestions.
> On 1/13/12 12:10 AM, Claudio Martella wrote:
>> Hi Avery,
>> thanks for your feedback. I know that users can decide to drop this
>> behavior, but this doesn't mean that those three points don't hold, to
>> On Fri, Jan 13, 2012 at 8:35 AM, Avery Ching<ach...@apache.org> wrote:
>>> You are right that vertices are created automatically when messages are
>>> to non-existent vertices. But that behavior can be made application
>>> specific. The default resolution of mutations/messages is
>>> But you are always welcome to implement your own application specific
>>> behavior. For instance, you might just want to drop the message. If
>>> is a simultaneous create/delete, you may want to always create. You have
>>> the power to implement any behavior you want by setting the vertex
>>> (see GiraphJob#setVertexResolverClass()).
>>> Hope this helps,
>>> On 1/12/12 3:42 PM, Claudio Martella wrote:
>>>> Hello Giraphers,
>>>> I have a few comments about the current design of Giraph regarding the
>>>> implicit creation of vertices.
>>>> As it's currently designed, if you send a message to a non-existent
>>>> vertices, Giraph creates it for you.
>>>> Although I can understand it can get handy as it allows for lazy
>>>> dataset creation, I think it comes at some cost and I believe this
>>>> cost is bigger than the advantage:
>>>> 1) it overlaps the mutation API, where a vertex can be created
>>>> explicitly when the semantics of the algorithm require it, with
>>>> knowledge about what's going on and with explicit state. This is an
>>>> ambiguous and unclear part of the API which is difficult for me to
>>>> justify and probably confusing for the user too. Which brings me to
>>>> the second point.
>>>> 2) it requires a different, and partially duplicate,code path for
>>>> mutations and implicit vertex creation in our code, as it's clear by
>>>> looking at BasicRPCCommunication and as it's been experienced
>>>> currently by me in the email I recently sent to the list. Which brings
>>>> me to the third point.
>>>> 3) in order to manage this, for every message we have to hit, sooner
>>>> or later, the Worker vertices set to see if the vertex is existing and
>>>> whether it should be implicitly created. This is computationally
>>>> expensive both if you have a HashMap but also if you have a TreeMap
>>>> for range partitioning. Also, if we're going to create more exotic
>>>> partitioning (topology-partitioning?), we're going to hit the problem
>>>> In general, I don't know any graph API that doesn't require to either
>>>> list explicitly the vertex set at load or to create the vertex
>>>> explicitly through API. As I said, I understand it allows for lazy
>>>> creation of the input file, with possibly missing vertices explicitly
>>>> enlisted (missing as a source vertex but existing as an endpoint for
>>>> an edge), but this could be really fixed robustly by a single
>>>> MapReduce job.
>>>> What do you guys think?