[
https://issues.apache.org/jira/browse/FLINK-17295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536476#comment-17536476
]
Zhu Zhu commented on FLINK-17295:
---------------------------------
I had another thought and did some experiment, now I prefer to introduce an
ExecutionGraphID(a random AbstractID), as [~chesnay] once proposed. In this
way, an ExecutionAttemptID will be a combination of (ExecutionGraphID,
ExecutionVertexID, attemptNumber).
The ExecutionGraphID will be regenerated each time an execution graph is
created. It can be unique across similar shaped jobs, job re-submissions,
recaled jobs and jm failovers. The same ExecutionGraphID always points to the
same execution graph instance. And there is no chance for one execution graph
instance to create 2 executions of the same ExecutionVertexID (which means the
same JobVertexID and the same subtaskIndex) and the same attemptNumber.
Therefore, I think it's as safe as the way to maintaining a random AbstractID
for each ExecutionAttemptID.
The benefits of introducing an ExecutionGraphID includes:
1. Smaller task deployment descriptors. In large scale job case, the TDD size
can be 70% smaller. This can speed up task deployment.
2. Easier to match an axecution to the corresponding ExecutionGraph. This can
be helpful in reactive mode which may re-generates the execution graph for
multiple times.
3. Lower possibility of ID collision. No execution id collision would happen
once the ExecutionGraphID is decided, even if there are hundreds-of-thousands
of executions in a job.
Below is the benchmark result of the TDD size (bytes):
||ID pattern||AbstractID
(current state)||AbstractID
+ ExecutionVertexID
+ attemptNumber(int)
(proposal #1)||ExecutionGraphID
+ ExecutionVertexID
+ attemptNumber(int)
(proposal #2)||
|parallelism=10|7,034|7,526|7,464|
|parallelism=100|8,949|9,688|7,950|
|parallelism=1000|27,968|31,717|13,148|
|parallelism=10000|217,081|251,795|65,885|
> Refactor the ExecutionAttemptID to consist of ExecutionVertexID and
> attemptNumber
> ---------------------------------------------------------------------------------
>
> Key: FLINK-17295
> URL: https://issues.apache.org/jira/browse/FLINK-17295
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Reporter: Yangze Guo
> Assignee: Zhu Zhu
> Priority: Major
> Labels: pull-request-available
>
> Make the ExecutionAttemptID being composed of (ExecutionVertexID,
> attemptNumber).
--
This message was sent by Atlassian Jira
(v8.20.7#820007)