[
https://issues.apache.org/jira/browse/GIRAPH-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568052#comment-13568052
]
Nitay Joffe commented on GIRAPH-494:
------------------------------------
1) Agreed it's memory only, but it's actually closer to 8GB. We're on a 64-bit
machine, pointers at 8 bytes each. I don't think you could get a 32-bit JVM to
load 1B+ edges. I have about 1.2B edges per worker so let's call it 10GB total.
To me that does not seem like peanuts in terms of active memory used.
2) I would argue that having Edge / MutableEdge as interfaces is the right way
to go in terms of object oriented design. This change does not make it
impossible to change them we just have to expose MutableEdge where changes are
desired. If the algorithm knows it is using MutableEdge then it stores those
and can use them as such. We already have gotchas in the codebase like
RepresentativeVertex where the user needs to know that they shouldn't change
Vertex/Edge objects retrieved. If anything I think having clear cut interfaces
like this does exactly the opposite - it makes it explicitly clear what the API
is and allows us to control it, rather than exposing big Java objects with lots
of public methods.
> Edge should be an interface
> ---------------------------
>
> Key: GIRAPH-494
> URL: https://issues.apache.org/jira/browse/GIRAPH-494
> Project: Giraph
> Issue Type: Bug
> Reporter: Nitay Joffe
> Assignee: Nitay Joffe
> Attachments: GIRAPH-494.patch
>
>
> In terms of architecture and for flexibility I think our Edge class should be
> an interface instead of a real class. In this diff I change it to an
> interface and add a sub interface called MutableEdge. The existing Edge class
> is now called DefaultEdge. Note that only one class in our codebase actually
> needs a MutableEdge - RepresentativeVertex. Everything else works perfectly
> fine using the immutable Edge interface.
> One nice thing this allowed me to do is to create a EdgeNoValue which we can
> use for algorithms whose edges have no value at all. Currently the same
> functionality is achieved by using NullWritable, however using EdgeNoValue
> means not storing a reference to the single NullWritable instance in every
> single edge. Working on a job that reads 1B+ edges per worker, a pointer per
> edge adds up.
> https://reviews.apache.org/r/9172/
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira