[
https://issues.apache.org/jira/browse/FLINK-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-5127:
----------------------------------
Labels: auto-deprioritized-major auto-unassigned (was: auto-unassigned
stale-major)
Priority: Minor (was: Major)
This issue was labeled "stale-major" 7 days ago and has not received any
updates so it is being deprioritized. If this ticket is actually Major, please
raise the priority and ask a committer to assign you the issue or revive the
public discussion.
> Reduce the amount of intermediate data in vertex-centric iterations
> -------------------------------------------------------------------
>
> Key: FLINK-5127
> URL: https://issues.apache.org/jira/browse/FLINK-5127
> Project: Flink
> Issue Type: Improvement
> Components: Library / Graph Processing (Gelly)
> Affects Versions: 1.1.0, 1.2.0
> Reporter: Vasia Kalavri
> Priority: Minor
> Labels: auto-deprioritized-major, auto-unassigned
>
> The vertex-centric plan contains a join between the workset (messages) and
> the solution set (vertices) that outputs <Vertex, Message> tuples. This
> intermediate dataset is then co-grouped with the edges to provide the Pregel
> interface directly.
> This issue proposes an improvement to reduce the size of this intermediate
> dataset. In particular, the vertex state does not have to be attached to all
> the output tuples of the join. If we replace the join with a coGroup and use
> an `Either` type, we can attach the vertex state to the first tuple only. The
> subsequent coGroup can retrieve the vertex state from the first tuple and
> correctly expose the Pregel interface.
> In my preliminary experiments, I find that this change reduces intermediate
> data by 2x for small vertex state and 4-5x for large vertex states.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)