[ 
https://issues.apache.org/jira/browse/FLINK-17295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536476#comment-17536476
 ] 

Zhu Zhu commented on FLINK-17295:
---------------------------------

I had another thought and did some experiment, now I prefer to introduce an 
ExecutionGraphID(a random AbstractID), as [~chesnay] once proposed. In this 
way, an ExecutionAttemptID will be a combination of (ExecutionGraphID, 
ExecutionVertexID, attemptNumber).

 

The ExecutionGraphID will be regenerated each time an execution graph is 
created. It can be unique across similar shaped jobs, job re-submissions, 
recaled jobs and jm failovers. The same ExecutionGraphID always points to the 
same execution graph instance. And there is no chance for one execution graph 
instance to create 2 executions of the same ExecutionVertexID (which means the 
same JobVertexID and the same subtaskIndex) and the same attemptNumber. 
Therefore, I think it's as safe as the way to maintaining a random AbstractID 
for each ExecutionAttemptID.

 

The benefits of introducing an ExecutionGraphID includes:
1. Smaller task deployment descriptors. In large scale job case, the TDD size 
can be 70% smaller. This can speed up task deployment.
2. Easier to match an axecution to the corresponding ExecutionGraph. This can 
be helpful in reactive mode which may re-generates the execution graph for 
multiple times.
3. Lower possibility of ID collision. No execution id collision would happen 
once the ExecutionGraphID is decided, even if there are hundreds-of-thousands 
of executions in a job.

 

Below is the benchmark result of the TDD size (bytes):
||ID pattern||AbstractID
(current state)||AbstractID
+ ExecutionVertexID 
+ attemptNumber(int) 
(proposal #1)||ExecutionGraphID 
+ ExecutionVertexID 
+ attemptNumber(int)
(proposal #2)||
|parallelism=10|7,034|7,526|7,464|
|parallelism=100|8,949|9,688|7,950|
|parallelism=1000|27,968|31,717|13,148|
|parallelism=10000|217,081|251,795|65,885|

> Refactor the ExecutionAttemptID to consist of ExecutionVertexID and 
> attemptNumber
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-17295
>                 URL: https://issues.apache.org/jira/browse/FLINK-17295
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Yangze Guo
>            Assignee: Zhu Zhu
>            Priority: Major
>              Labels: pull-request-available
>
> Make the ExecutionAttemptID being composed of (ExecutionVertexID, 
> attemptNumber).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to