[
https://issues.apache.org/jira/browse/GIRAPH-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568530#comment-13568530
]
Claudio Martella commented on GIRAPH-494:
-----------------------------------------
Quite frankly the memory impact of this patch is measurable without benchmarks.
It is one reference per edge, there's no magic involved. The comparison between
giraph and other systems show that we eat and waste so much memory. I recently
ran PageRankBenchmark on 64 workers with 7GB heap each for a 65M vertices graph
and 100 edges each, and it went OOM. This is quite incredible. Other systems
(Signal/Collect) run PR on less machines/memory within 60 seconds on that graph.
Memory consumption should be at the top of our priority. Plus, I strongly
believe that most of the algorithms out there live happily without a value, and
we should not penalize them.
I agree with you that the API is not there yet, it is not coherent, and there
is no bigger picture. But we are not out there with 0.2 yet, and this is the
moment to break the API. This does not mean that we should keep on breaking it
regardless, of course.
> Edge should be an interface
> ---------------------------
>
> Key: GIRAPH-494
> URL: https://issues.apache.org/jira/browse/GIRAPH-494
> Project: Giraph
> Issue Type: Bug
> Reporter: Nitay Joffe
> Assignee: Nitay Joffe
> Attachments: GIRAPH-494.patch
>
>
> In terms of architecture and for flexibility I think our Edge class should be
> an interface instead of a real class. In this diff I change it to an
> interface and add a sub interface called MutableEdge. The existing Edge class
> is now called DefaultEdge. Note that only one class in our codebase actually
> needs a MutableEdge - RepresentativeVertex. Everything else works perfectly
> fine using the immutable Edge interface.
> One nice thing this allowed me to do is to create a EdgeNoValue which we can
> use for algorithms whose edges have no value at all. Currently the same
> functionality is achieved by using NullWritable, however using EdgeNoValue
> means not storing a reference to the single NullWritable instance in every
> single edge. Working on a job that reads 1B+ edges per worker, a pointer per
> edge adds up.
> https://reviews.apache.org/r/9172/
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira