[
https://issues.apache.org/jira/browse/GIRAPH-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567979#comment-13567979
]
Alessandro Presta commented on GIRAPH-494:
------------------------------------------
Sorry I'm late to this party. I'll share my thoughts on this anyway:
1) Regarding EdgeNoValue, the same as in GIRAPH-493 applies: in principle it
looks like getting rid of a reference per edge might help, but let's verify
that with a benchmark or two. For one thing, it won't affect GC (references are
not objects), so the only impact is in memory usage. In your 1B edges/worker
example, this amounts to 3.7GB. Compared to how much memory we consume overall
on that same worker (for this data size), you can argue it's peanuts.
2) "Note that only one class in our codebase actually needs a MutableEdge -
RepresentativeVertex": this is from the point of view of the implementation
(RepresentativeVertex needs to reuse edge objects). From an API level, it's
somewhat of a gray area whether an algorithm should be allowed to modify edge
values in place. Ask Maja, who's trying to do exactly that. This change makes
it impossible (which is a possible solution; we just need to be clear on the
semantics of objects we hand to the user: is this a reference to an internal
data structure? Is it just a copy?).
> Edge should be an interface
> ---------------------------
>
> Key: GIRAPH-494
> URL: https://issues.apache.org/jira/browse/GIRAPH-494
> Project: Giraph
> Issue Type: Bug
> Reporter: Nitay Joffe
> Assignee: Nitay Joffe
> Attachments: GIRAPH-494.patch
>
>
> In terms of architecture and for flexibility I think our Edge class should be
> an interface instead of a real class. In this diff I change it to an
> interface and add a sub interface called MutableEdge. The existing Edge class
> is now called DefaultEdge. Note that only one class in our codebase actually
> needs a MutableEdge - RepresentativeVertex. Everything else works perfectly
> fine using the immutable Edge interface.
> One nice thing this allowed me to do is to create a EdgeNoValue which we can
> use for algorithms whose edges have no value at all. Currently the same
> functionality is achieved by using NullWritable, however using EdgeNoValue
> means not storing a reference to the single NullWritable instance in every
> single edge. Working on a job that reads 1B+ edges per worker, a pointer per
> edge adds up.
> https://reviews.apache.org/r/9172/
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira