Hi Wei,

Thanks a lot for your feedback. Very good questions!

>>> 1. It seems that we dynamically load an embedded Python and user
dependencies in the TM process. Can they be uninstalled cleanly after the
task finished? i.e. Can we use the Thread Mode in session mode and Pyflink
shell?

I mentioned the limitation of this part in FLIP. There is no problem
without changing the python interpreter, but if you need to change the
python interpreter, there is really no way to reload the Python library.
The problem is mainly caused by many Python libraries having an assumption
that they own the process alone.

>>> 2. Does one TM have only one embedded Python running at the same time?
If all the Python operator in the TM share the same PVM, will there be a
loss in performance?

Your understanding is correct that one TM have only one embedded Python
running at the same time. I guess you are worried about the performance
loss of multi threads caused by Python GIL. There is a one-to-one
correspondence between Java worker thread and Python subinterpreters.
Although the subinterpreters has not yet completely overcome the GIL
sharing problem(The Python community’s recent plan for a per-interpreter
GIL is also under discussion[1]), the performance of subinterpreters is
very close to that of multiprocessing [2].

>>> 3. How do we load the relevant c library if the python.executable is
provided by users?

Once python.executable is provided, PEMJA will dynamically load the CPython
library (libpython.*so or libpython.*dylib) and pemja.so installed in the
python environment.

>>> May there be a risk of version conflicts?

I understand that this question is actually discussing whether C/C++ has a
way to solve the problem of relying on different versions of a library.
First of all, we know that if there is only static linking, there will be
no such problem.  And I have studied the source code of CPython[3], and
there is no usage of dynamic linking. The rest is the case where dynamic
linking is used in the C library written by the users. There are many ways
to solve this problem with dynamic linking, but after all, this library is
written by users, and it is difficult for us to guarantee that there will
be no conflicts. At this time, Process Mode will be the choice of falk back.

[1]
https://mail.python.org/archives/list/python-...@python.org/thread/S5GZZCEREZLA2PEMTVFBCDM52H4JSENR/#RIK75U3ROEHWZL4VENQSQECB4F4GDELV
[2]
https://mail.python.org/archives/list/python-...@python.org/thread/PNLBJBNIQDMG2YYGPBCTGOKOAVXRBJWY/#L5OXHXPFONRKLR3W6U46LUSUIBN4FCZQ
[3] https://github.com/python/cpython

Best,
Xingbo

Wei Zhong <weizhong0...@gmail.com> 于2021年12月31日周五 11:49写道:

> Hi Xingbo,
>
> Thanks for creating this FLIP. Big +1 for it!
>
> I have some question about the Thread Mode:
>
> 1. It seems that we dynamically load an embedded Python and user
> dependencies in the TM process. Can they be uninstalled cleanly after the
> task finished? i.e. Can we use the Thread Mode in session mode and Pyflink
> shell?
>
> 2. Does one TM have only one embedded Python running at the same time? If
> all the Python operator in the TM share the same PVM, will there be a loss
> in performance?
>
> 3. How do we load the relevant c library if the python.executable is
> provided by users? May there be a risk of version conflicts?
>
> Best,
> Wei
>
>
> > 2021年12月29日 上午11:56,Xingbo Huang <hxbks...@gmail.com> 写道:
> >
> > Hi everyone,
> >
> > I would like to start a discussion thread on "Support PyFlink Runtime
> > Execution in Thread Mode"
> >
> > We have provided PyFlink Runtime framework to support Python user-defined
> > functions since Flink 1.10. The PyFlink Runtime framework is called
> Process
> > Mode, which depends on an inter-process communication architecture based
> on
> > the Apache Beam Portability framework. Although starting a dedicated
> > process to execute Python user-defined functions could have better
> resource
> > isolation, it will bring greater resource and performance overhead.
> >
> > In order to overcome the resource and performance problems on Process
> Mode,
> > we will propose a new execution mode which executes Python user-defined
> > functions in the same thread instead of a separate process.
> >
> > I have drafted the FLIP-206[1]. Please feel free to reply to this email
> > thread. Looking forward to your feedback!
> >
> > Best,
> > Xingbo
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-206%3A+Support+PyFlink+Runtime+Execution+in+Thread+Mode
>
>

Reply via email to