[jira] [Created] (FLINK-23194) Cache and reuse the ContainerLaunchContext and accelarate the progress of createTaskExecutorLaunchContext on yarn

zlzhang0122 (Jira) Wed, 30 Jun 2021 09:02:08 -0700

zlzhang0122 created FLINK-23194:
-----------------------------------

             Summary: Cache and reuse the ContainerLaunchContext and accelarate 
the progress of createTaskExecutorLaunchContext on yarn
                 Key: FLINK-23194
                 URL: https://issues.apache.org/jira/browse/FLINK-23194
             Project: Flink
          Issue Type: Improvement
          Components: Deployment / YARN
    Affects Versions: 1.12.4, 1.13.1
            Reporter: zlzhang0122
             Fix For: 1.14.0



When starting the TaskExecutor in container on yarn, this will create 
ContainerLaunchContext for n times(n represent the number of the TaskManager).

When I examine the progress of this creation, I found that most of them were in 
common and have nothing to do with the particular TaskManager except the 
launchCommand. We can create ContainerLaunchContext once and reuse it. Only the 
launchCommand need to create separately for every particular TaskManager.

So I propose that we can cache and reuse the ContainerLaunchContext object to 
accelerate this creation progress. 

I think this can have some benefit like below:
 # this can accelerate the creation of ContainerLaunchContext and also the 
start of the TaskExecutor, especially under the situation of massive 
TaskManager.
 # this can decrease the pressure of the HDFS, etc. 
 # this can also avoid the suddenly failure of the HDFS or yarn, etc.

We have implemented this on our production environment. So far there has no 
problem and have a good benefit. Let me know if there's any point that I 
haven't considered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23194) Cache and reuse the ContainerLaunchContext and accelarate the progress of createTaskExecutorLaunchContext on yarn

Reply via email to