[
https://issues.apache.org/jira/browse/GIRAPH-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486170#comment-13486170
]
Eli Reisman commented on GIRAPH-388:
------------------------------------
I like this idea, I think there's a lot of potential improvement to be made in
this area of the code. One thing about the VertexIdMessageCollection idea: by
mapping straight pairs of I and M together, we duplicate a lot of info. Many
times the same vertexId can occur in a given outgoing VertexIdMessageCollection
when multiple messages are being sent to the same vertex, and the same with
duplicate msgs.
Is there a way we can use some kind of mapping to avoid the duplication of
information, and the increased size of the serialized message data structures?
What other methods of simplifying the outgoing message data structures did you
try? What sort of memory/space tradeoffs are we making for the speed gains
doing it this way?
Finally, I would love to see these data structures integrate with the disk
spill structures better so that if we take steps not to duplicate a references
needlessly on the sender side with the partition -> vertexId -> messages (for
example) we do not create fresh duplicates of all these repeat references when
we deserialize from disk. I kind of figured the next redesign in this area
would incorporate some steps in that direction.
Anyway, nice work I'm curious to see where this leads. I love the idea of
simplifying and revaluating this part of the code base.
> Improve the way we keep outgoing messages
> -----------------------------------------
>
> Key: GIRAPH-388
> URL: https://issues.apache.org/jira/browse/GIRAPH-388
> Project: Giraph
> Issue Type: Improvement
> Reporter: Maja Kabiljo
> Attachments: GIRAPH-388.patch
>
>
> As per discussion on GIRAPH-357, in standard application chances that we get
> to use client-side combiner are very low. I experimented with benefits which
> we can get from not having the client-side combiner at all. It turns out that
> having a lot of maps in SendMessageCache, and then collection inside each of
> them, really hurts the performance.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira