Re: [DISCUSS] PyFlink User-Defined Function Resource Management

Dian Fu Thu, 05 Dec 2019 19:34:08 -0800

Hi Jingsong,

Appreciated for your sharing. It's very helpful as the Python operator will 
take the similar way.


Thanks,
Dian

> 在 2019年12月6日，上午11:12，Jingsong Li <jingsongl...@gmail.com> 写道：
> 
> Hi Dian,
> 
> After [1] and [2], in the batch sql world, we will:
> - [2] In client/compile side: we use memory weight request memory for
> Transformation.
> - [1] In runtime side: we use memory fraction to compute memory size and
> allocate in StreamOperator.
> For your information.
> 
> [1] https://jira.apache.org/jira/browse/FLINK-14063
> [2] https://jira.apache.org/jira/browse/FLINK-15035
> 
> Best,
> Jingsong Lee
> 
> On Tue, Dec 3, 2019 at 6:07 PM Dian Fu <dian0511...@gmail.com> wrote:
> 
>> Hi Jingsong,
>> 
>> Thanks for your valuable feedback. I have updated the "Example" section
>> describing how to use these options in a Python Table API program.
>> 
>> Thanks,
>> Dian
>> 
>>> 在 2019年12月2日，下午6:12，Jingsong Lee <lzljs3620...@apache.org> 写道：
>>> 
>>> Hi Dian:
>>> 
>>> Thanks for you explanation.
>>> If you can update the document to add explanation for the changes to the
>>> table layer,
>>> it might be better. (it's just a suggestion, it depends on you)
>>> About forwardedInputQueue in AbstractPythonScalarFunctionOperator,
>>> Will this queue take up a lot of memory?
>>> Can it also occupy memory as large as buffer.memory?
>>> If so, what we're dealing with now is the silent use of heap memory?
>>> I feel a little strange, because the memory on the python side will
>> reserve,
>>> but the memory on the JVM side is used silently.
>>> 
>>> After carefully seeing your comments on Google doc:
>>>> The memory used by the Java operator is currently accounted as the task
>>> on-heap memory. We can revisit this if we find it's a problem in the
>> future.
>>> I agree that we can ignore it now, But we can add some content to the
>>> document to remind the user, What do you think?
>>> 
>>> Best,
>>> Jingsong Lee
>>> 
>>> On Mon, Dec 2, 2019 at 5:17 PM Dian Fu <dian0511...@gmail.com> wrote:
>>> 
>>>> Hi Jingsong,
>>>> 
>>>> Thanks a lot for your comments. Please see my reply inlined below.
>>>> 
>>>>> 在 2019年12月2日，下午3:47，Jingsong Lee <lzljs3620...@apache.org> 写道：
>>>>> 
>>>>> Hi Dian:
>>>>> 
>>>>> 
>>>>> Thanks for your driving. I have some questions:
>>>>> 
>>>>> 
>>>>> - Where should these configurations belong? You have mentioned
>>>> tableApi/SQL,
>>>>> so should in TableConfig?
>>>> 
>>>> All Python related configurations are defined in PythonOptions. User
>> could
>>>> configure these configurations via TableConfig.getConfiguration.setXXX
>> for
>>>> Python Table API programs.
>>>> 
>>>>> 
>>>>> - If just in table/sql, whether it should be called: table.python.****,
>>>>> because in table, all config options are called table.***.
>>>> 
>>>> These configurations are not table specific. They will be used for both
>>>> Python Table API programs and Python DataStream API programs (which is
>>>> planned to be supported in the future). So python.xxx seems more
>>>> appropriate, what do you think?
>>>> 
>>>>> - What should table module do? So in CommonPythonCalc, we should read
>>>>> options from table config, and set resources to OneInputTransformation?
>>>> 
>>>> As described in the design doc, in compilation phase, for batch jobs,
>> the
>>>> required memory of the Python worker will be calculated according to the
>>>> configuration and set as the managed memory for the operator. For stream
>>>> jobs, the resource spec will be unknown(The reason is that currently the
>>>> resources for all the operators in stream jobs are unknown and it
>> doesn’t
>>>> support to configure both known and unknown resources in a single job).
>>>> 
>>>>> - Are all buffer.memory off-heap memory? I took a look
>>>>> to AbstractPythonScalarFunctionOperator, there is a
>> forwardedInputQueue,
>>>> is
>>>>> this one a heap queue? So we need heap memory too?
>>>> 
>>>> Yes, they are all off-heap memory which is supposed to be used by the
>>>> Python process. The forwardedInputQueue is a buffer used in the Java
>>>> operator and its memory is accounted as the on-heap memory.
>>>> 
>>>> Regards,
>>>> Dian
>>>> 
>>>>> 
>>>>> Hope to get your reply.
>>>>> 
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Jingsong Lee
>>>>> 
>>>>> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu <dian0511...@gmail.com>
>> wrote:
>>>>> 
>>>>>> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
>>>>>> offline and also on the design doc.
>>>>>> 
>>>>>> It seems that we have reached consensus on the design. I would bring
>> up
>>>>>> the VOTE if there is no other feedbacks.
>>>>>> 
>>>>>> Thanks,
>>>>>> Dian
>>>>>> 
>>>>>>> 在 2019年11月22日，下午2:51，Hequn Cheng <chenghe...@gmail.com> 写道：
>>>>>>> 
>>>>>>> Thanks a lot for putting this together, Dian! Definitely +1 for this!
>>>>>>> It is great to make sure that the resources used by the Python
>> process
>>>>>> are
>>>>>>> managed properly by Flink’s resource management framework.
>>>>>>> 
>>>>>>> Also, thanks to the guys that working on the unified memory
>> management
>>>>>>> framework.
>>>>>>> 
>>>>>>> Best, Hequn
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo <karma...@gmail.com>
>> wrote:
>>>>>>> 
>>>>>>>> Thanks for driving this discussion, Dian!
>>>>>>>> 
>>>>>>>> +1 for this proposal. It will help to reduce container failure due
>> to
>>>>>>>> the memory overuse.
>>>>>>>> Some comments left in the design doc.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Yangze Guo
>>>>>>>> 
>>>>>>>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song <tonysong...@gmail.com
>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Sorry for the late reply.
>>>>>>>>> 
>>>>>>>>> +1 for the general proposal.
>>>>>>>>> 
>>>>>>>>> And one remainder, to use UNKNOWN resource requirement, we need to
>>>> make
>>>>>>>>> sure optimizer knowns which operators use off-heap managed memory,
>>>> and
>>>>>>>>> compute and set a fraction to the operators. See FLIP-53[1] for
>> more
>>>>>>>>> details, and I would suggest you to double check with @Zhu Zhu who
>>>>>> works
>>>>>>>> on
>>>>>>>>> this part.
>>>>>>>>> 
>>>>>>>>> Thank you~
>>>>>>>>> 
>>>>>>>>> Xintong Song
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>>>>>>>> 
>>>>>>>>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu <dian0511...@gmail.com>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Jincheng,
>>>>>>>>>> 
>>>>>>>>>> Thanks for the reply and also looking forward to the feedback from
>>>> the
>>>>>>>>>> community.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Dian
>>>>>>>>>> 
>>>>>>>>>>> 在 2019年11月11日，下午2:34，jincheng sun <sunjincheng...@gmail.com> 写道：
>>>>>>>>>>> 
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> +1， Thanks for bring up this discussion Dian!
>>>>>>>>>>> 
>>>>>>>>>>> The Resource Management is very important for PyFlink UDF. So,
>> It's
>>>>>>>> great
>>>>>>>>>>> if anyone can add more comments or inputs in the design doc or
>>>>>>>> feedback
>>>>>>>>>> in
>>>>>>>>>>> ML. :)
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Jincheng
>>>>>>>>>>> 
>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年11月5日周二 上午11:32写道：
>>>>>>>>>>> 
>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>> 
>>>>>>>>>>>> In FLIP-58[1] it will add the support of Python user-defined
>>>>>>>> stateless
>>>>>>>>>>>> function for Python Table API. It will launch a separate Python
>>>>>>>> process
>>>>>>>>>> for
>>>>>>>>>>>> Python user-defined function execution. The resources used by
>> the
>>>>>>>> Python
>>>>>>>>>>>> process should be managed properly by Flink’s resource
>> management
>>>>>>>>>>>> framework. FLIP-49[2] has proposed a unified memory management
>>>>>>>> framework
>>>>>>>>>>>> and PyFlink user-defined function resource management should be
>>>>>>>> based on
>>>>>>>>>>>> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline
>> about
>>>>>>>>>> this. I
>>>>>>>>>>>> draft a design doc[3] and want to start a discussion about
>> PyFlink
>>>>>>>>>>>> user-defined function resource management.
>>>>>>>>>>>> 
>>>>>>>>>>>> Welcome any comments on the design doc or giving us feedback on
>>>> the
>>>>>>>> ML
>>>>>>>>>>>> directly.
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Dian
>>>>>>>>>>>> 
>>>>>>>>>>>> [1]
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>>>>>>>>>>> [2]
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
>>>>>>>>>>>> [3]
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Best, Jingsong Lee
>>>> 
>>>> 
>>> 
>>> --
>>> Best, Jingsong Lee
>> 
>> 
> 
> -- 
> Best, Jingsong Lee

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

Reply via email to