[jira] [Commented] (FLINK-18738) Revisit resource management model for python processes.

Xintong Song (Jira) Wed, 12 Aug 2020 00:56:42 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-18738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176128#comment-17176128
 ]


Xintong Song commented on FLINK-18738:
--------------------------------------

Thanks for the feedbacks, [~trohrmann] & [~sewen].

 

To [~trohrmann],
{quote}I think it would be nice to solve this problem not only for Python but 
also for other languages we might want to support in the future (if possible 
and if it does not broaden the scope too massively).
{quote}
I like the idea to not only solve this problem for Python, but provide a 
general mechanism for all non-java languages we might support in future.
{quote}The first question which comes to my mind is whether the 
{{TaskExecutor}} should be responsible for managing the component running the 
Python (or any other language) code or not. If there is no good reason for the 
{{TaskExecutor}} to manage this component, it could make the {{TaskExecutor}} 
simpler by not aggregating too many responsibilities. If the language execution 
service benefitted from a tight integration in the execution of a job (e.g. by 
allocating and releasing resources swiftly) on the other hand, then a tighter 
integration could make sense.
{quote}
I think the biggest benefit for integrating the external language execution 
service into task managers is that, we can reuse the existing task scheduling 
and deployment mechanism. To be specific, to manage the external language 
execution service completely independent from task managers, we probably need 
to deal with the following issues.
 * Lifecycle management: Active resource managers or users need to request 
resource for external language executors, launch the executors. Resource 
managers needs to accept executor registrations, establish connection and 
heartbeats.
 * Scheduling: Do external language executors follow the slot based scheduling? 
If not, how do we decide how many tasks should be scheduled onto each executor? 
And which executor a task should be scheduled onto?
 * Deploying: Who's responsible for deploying tasks into external language 
executors, job masters or Java tasks in task managers? The former requires job 
master to be aware of different kind of executors, and establish connections 
with each of them. For the later, how do the Java tasks know which executor to 
deploy the workload?
 * Does the external language executors need to support data shuffling?
 * How do they access the state backends? Does the external executors need to 
be colocated with the task managers?

For most of the above mentioned issues, we already have solutions in task 
managers. It would be much simpler if we can reuse them directly. Integrating 
the external language execution service with task managers might complicate the 
task manager a bit, but should make it simpler for other components.

We could further scope the complexity out from TaskExecutor with a plugin-based 
approach, similar to state backends. Will describe later.
{quote}Whether a supported language supports code isolation or not should not 
necessarily decide which process model to choose.
{quote}
Makes sense to me.
{quote}Concerning the proposed memory management: Does option 2. entail that 
one enables the cluster to run with Python at deployment time of the 
{{TaskExecutor}}? Differently asked will we reserve a fixed amount of memory 
for the Python process independent of whether we actually run a Python workload 
or not?
{quote}
True. We could make Python memory by default 0, but option 2) means as long as 
it is configured non-zero, the memory will be reserved with or without the 
python workload.

 

To [~sewen],
{quote}Generally, going with one Python process per slot seems fair, but that 
means the Python process executes multiple operators.
How well does that work? Is there once gRPC connection that handles all 
operators, or are there multiple streams between the processes (one per 
operator)? Are there any issues with GIL or so that impact performance?
{quote}
I'm not absolutely sure about this. Maybe [~dian.fu] could share some inputs.
{quote}The managed memory integration on the other hand could be quite simple. 
The Python process could be a simple external resource claiming budget from the 
Memory Manager. The per-slot bookkeeping, the singleton initialization, ref 
counting, thread-safe shutdown, all of that is already there from the RocksDB 
integration. It should be straightforward to just reduce this.

The advantage of this is that there is
 - simple, no extended memory model
 - no wasted memory in session clusters that are used in a mixed way (this is 
probably not too important to optimize for, though).{quote}
Thanks Stephan. I think this is a really good suggestion to reuse the shared 
resource mechanism in memory manager. I think no wasted memory is an important 
advantage, not only for session clusters, but also for scenarios with multiple 
slot sharing groups, and batch workloads.

The shortcoming is, as previously mentioned, coupling between RocksDB and 
Python memory. I think we should be able to work it out, and the benefit sounds 
worth the effort. 

 

*External Language Execution Framework*

The idea of reusing the memory sharing mechanism from RocksDB inspires me about 
how the external language execution service is similar with RocksDB state 
backend. Basically, they both claim memory budget from memory manager, and 
provide services to operators.

If we could define some sort of `ExternalExecutionBackend` interface, then most 
of the details and complexity can be hidden in the implementation fo the 
interface. Task manager does not need to know how the backend uses the memory, 
whether it calls a library, launches a process or connects to an external 
service, etc.. Whether the backends shares the same Python process or each uses 
a dedicated process is also hidden inside the backend implementation.

We could easily extend Flink with more language supports by providing new 
implementations of the backend interfaces. Ideally, the process model and 
memory management of Flink framework would need no changes. 

> Revisit resource management model for python processes.
> -------------------------------------------------------
>
>                 Key: FLINK-18738
>                 URL: https://issues.apache.org/jira/browse/FLINK-18738
>             Project: Flink
>          Issue Type: Task
>          Components: API / Python, Runtime / Coordination
>            Reporter: Xintong Song
>            Assignee: Xintong Song
>            Priority: Major
>             Fix For: 1.12.0
>
>
> This ticket is for tracking the effort towards a proper long-term resource 
> management model for python processes.
> In FLINK-17923, we run into problems due to python processes are not well 
> integrate with the task manager resource management mechanism. A temporal 
> workaround has been merged for release-1.11.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18738) Revisit resource management model for python processes.

Reply via email to