[
https://issues.apache.org/jira/browse/FLINK-11256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chesnay Schepler updated FLINK-11256:
-------------------------------------
Issue Type: Improvement (was: Bug)
> Referencing StreamNode objects directly in StreamEdge causes the sizes of
> JobGraph and TDD to become unnecessarily large
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-11256
> URL: https://issues.apache.org/jira/browse/FLINK-11256
> Project: Flink
> Issue Type: Improvement
> Affects Versions: 1.7.0, 1.7.1
> Reporter: Haibo Sun
> Assignee: Haibo Sun
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.8.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> When a job graph is generated from StreamGraph, StreamEdge(s) on the stream
> graph are serialized to StreamConfig and stored into the job graph. After
> that, the serialized bytes will be included in the TDD and distributed to TM.
> Because StreamEdge directly reference to StreamNode objects includingÂ
> sourceVertex and targetVertex, these objects are also written transitively on
> serializing StreamEdge. But these StreamNode objects are not needed in JM and
> Task. For a large size topology, this will causes JobGraph/TDD to become much
> larger than that actually need, and more likely to occur rpc timeout when
> transmitted.
> In StreamEdge, only the ID of StreamNode should be stored to avoid this
> situation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)