[jira] [Commented] (FLINK-17295) Refactor the ExecutionAttemptID to consist of ExecutionVertexID and attemptNumber

Yangze Guo (Jira) Mon, 21 Dec 2020 21:03:05 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-17295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253275#comment-17253275
 ]


Yangze Guo commented on FLINK-17295:
------------------------------------

Hi, there. Since the 1.12 has been released, I'd like to revive this ticket.

In the beginning, this ticket proposed to make the ExecutionAttemptID being 
composed of (ExecutionVertexID, attemptNumber) to improve the log readability. 
In FLINK-19264, we found this change broke the assumption that 
ExecutionAttemptIDs are unique because there will be a collision of VertexID in 
graphs with the same topology. Then, we decided to add the JobID to it. 
However, in FLINK-19805, we found it still has some bad cases.

To solve the problem in FLINK-19805, we can:
- Introducing a field to identify the leader session or ensure the attempt 
number is monotone increasing across sessions.
- Introducing a truly random element. It seems to be the safest way to prevent 
other rare cases.

Considering the serialization overhead, come up with an attempt counter (stored 
in ZK/ConfigMap) might be a better choice. Add a truly random element(16bits) 
can increase the TDD size ~25% in my experiment(3000 parallelsim WordCount). 
However, we can't ensure that there are no new bad cases in the future. If the 
increase of TDD size is affordable, I tend to introduce a truly random element.

WDYT?

> Refactor the ExecutionAttemptID to consist of ExecutionVertexID and 
> attemptNumber
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-17295
>                 URL: https://issues.apache.org/jira/browse/FLINK-17295
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Yangze Guo
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.13.0
>
>
> Make the ExecutionAttemptID being composed of (ExecutionVertexID, 
> attemptNumber).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-17295) Refactor the ExecutionAttemptID to consist of ExecutionVertexID and attemptNumber

Reply via email to