[ 
https://issues.apache.org/jira/browse/GIRAPH-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486170#comment-13486170
 ] 

Eli Reisman commented on GIRAPH-388:
------------------------------------

I like this idea, I think there's a lot of potential improvement to be made in 
this area of the code. One thing about the VertexIdMessageCollection idea: by 
mapping straight pairs of I and M together, we duplicate a lot of info. Many 
times the same vertexId can occur in a given outgoing VertexIdMessageCollection 
when multiple messages are being sent to the same vertex, and the same with 
duplicate msgs.

Is there a way we can use some kind of mapping to avoid the duplication of 
information, and the increased size of the serialized message data structures? 
What other methods of simplifying the outgoing message data structures did you 
try? What sort of memory/space tradeoffs are we making for the speed gains 
doing it this way?

Finally, I would love to see these data structures integrate with the disk 
spill structures better so that if we take steps not to duplicate a references 
needlessly on the sender side with the partition -> vertexId -> messages (for 
example) we do not create fresh duplicates of all these repeat references when 
we deserialize from disk. I kind of figured the next redesign in this area 
would incorporate some steps in that direction.

Anyway, nice work I'm curious to see where this leads. I love the idea of 
simplifying and revaluating this part of the code base.

                
> Improve the way we keep outgoing messages
> -----------------------------------------
>
>                 Key: GIRAPH-388
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-388
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>         Attachments: GIRAPH-388.patch
>
>
> As per discussion on GIRAPH-357, in standard application chances that we get 
> to use client-side combiner are very low. I experimented with benefits which 
> we can get from not having the client-side combiner at all. It turns out that 
> having a lot of maps in SendMessageCache, and then collection inside each of 
> them, really hurts the performance. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to