[
https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150094#comment-13150094
]
Jakob Homan commented on GIRAPH-83:
-----------------------------------
Looking at the original Pregel paper, the Vertex instance has eight methods
(compute, vertex_id, superstep, GetValue, MutableValue, GetOutEdgeIterator,
SendMessageTo and VoteToHalt). Currently, BasicVertex has 24. There are also
three different types of Vertices (Vertex, MutableVertex and BasicVertex)
linked via inheritance and exposed to the users. I'm wondering if this
interface is quite right yet.
There are two main concerns: one, this is the contract users are starting to
write applications against and which we'll need to support for a long time,
with as few tweaks as possible. It'd be good to be relatively sure of its
limits before we make an initial release. Second, the use of inheritance to
join the user's implementation with the computation's state makes it difficult
to test. How does one mock out the state that's fed into compute and verify
compute's result without starting up a cluster (either real or local; see
GIRAPH-51).
Would it be reasonable to strip out as many methods as possible from Vertex,
particularly those dealing with state external to the Vertex itself:
* getSuperStep
* getNumVertices
* getNumEdges
* getMsgList/iterator
* getEdgeValue
* hasEdge
* sendMsg
* sendMsgToAllEdges
* (g|s)etGraphState
* getContext
* getWorkerContext
* registerAggregator
* useAggregator
The outEdges data structures are a bit odd in that they are intrinsic to the
vertex itself (in the mathematical sense), but are managed by the framework.
It might be a bit clunky, but structurally more correct to separate these out
as well.
These methods and the state they manipulate could then be passed in as a
Context (a new type of Context, not one of the two others we have running
around!) to the compute method. This moves compute() closer to a functional,
testing model of computing across its input state (which can be mocked out for
testing and mangled as we evolve its innards). The Vertex itself could still
of course maintain any state it would need, but like a Mapper, shouldn't need
much and would be discouraged from holding onto larges amounts of data between
computations.
Thoughts?
> Is Vertex correct yet?
> ----------------------
>
> Key: GIRAPH-83
> URL: https://issues.apache.org/jira/browse/GIRAPH-83
> Project: Giraph
> Issue Type: Improvement
> Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking
> we may not have it quite correct yet...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira