[
https://issues.apache.org/jira/browse/FLINK-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687323#comment-15687323
]
Greg Hogan commented on FLINK-5127:
-----------------------------------
+1
Does the {{Either}} type need to be sorted or can we rely on forwarding fields
to preserve the order?
Object reuse may be slower as {{Either}} types are immutable.
> Reduce the amount of intermediate data in vertex-centric iterations
> -------------------------------------------------------------------
>
> Key: FLINK-5127
> URL: https://issues.apache.org/jira/browse/FLINK-5127
> Project: Flink
> Issue Type: Improvement
> Components: Gelly
> Reporter: Vasia Kalavri
> Assignee: Vasia Kalavri
>
> The vertex-centric plan contains a join between the workset (messages) and
> the solution set (vertices) that outputs <Vertex, Message> tuples. This
> intermediate dataset is then co-grouped with the edges to provide the Pregel
> interface directly.
> This issue proposes an improvement to reduce the size of this intermediate
> dataset. In particular, the vertex state does not have to be attached to all
> the output tuples of the join. If we replace the join with a coGroup and use
> an `Either` type, we can attach the vertex state to the first tuple only. The
> subsequent coGroup can retrieve the vertex state from the first tuple and
> correctly expose the Pregel interface.
> In my preliminary experiments, I find that this change reduces intermediate
> data by 2x for small vertex state and 4-5x for large vertex states.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)