[ 
https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150094#comment-13150094
 ] 

Jakob Homan commented on GIRAPH-83:
-----------------------------------

Looking at the original Pregel paper, the Vertex instance has eight methods 
(compute, vertex_id, superstep, GetValue, MutableValue, GetOutEdgeIterator, 
SendMessageTo and VoteToHalt). Currently, BasicVertex has 24.  There are also 
three different types of Vertices (Vertex, MutableVertex and BasicVertex) 
linked via inheritance and exposed to the users.  I'm wondering if this 
interface is quite right yet.

There are two main concerns: one, this is the contract users are starting to 
write applications against and which we'll need to support for a long time, 
with as few tweaks as possible.  It'd be good to be relatively sure of its 
limits before we make an initial release.  Second, the use of inheritance to 
join the user's implementation with the computation's state makes it difficult 
to test.  How does one mock out the state that's fed into compute and verify 
compute's result without starting up a cluster (either real or local; see 
GIRAPH-51).

Would it be reasonable to strip out as many methods as possible from Vertex, 
particularly those dealing with state external to the Vertex itself: 
* getSuperStep
* getNumVertices
* getNumEdges
* getMsgList/iterator
* getEdgeValue
* hasEdge
* sendMsg
* sendMsgToAllEdges
* (g|s)etGraphState
* getContext
* getWorkerContext
* registerAggregator
* useAggregator

The outEdges data structures are a bit odd in that they are intrinsic to the 
vertex itself (in the mathematical sense), but are managed by the framework.  
It might be a bit clunky, but structurally more correct to separate these out 
as well.
  
These methods and the state they manipulate could then be passed in as a 
Context (a new type of Context, not one of the two others we have running 
around!) to the compute method.  This moves compute() closer to a functional, 
testing model of computing across its input state (which can be mocked out for 
testing and mangled as we evolve its innards).  The Vertex itself could still 
of course maintain any state it would need, but like a Mapper, shouldn't need 
much and would be discouraged from holding onto larges amounts of data between 
computations.

Thoughts?
                
> Is Vertex correct yet?
> ----------------------
>
>                 Key: GIRAPH-83
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-83
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking 
> we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to