Hi Jingsong, Appreciated for your sharing. It's very helpful as the Python operator will take the similar way.
Thanks, Dian > 在 2019年12月6日,上午11:12,Jingsong Li <jingsongl...@gmail.com> 写道: > > Hi Dian, > > After [1] and [2], in the batch sql world, we will: > - [2] In client/compile side: we use memory weight request memory for > Transformation. > - [1] In runtime side: we use memory fraction to compute memory size and > allocate in StreamOperator. > For your information. > > [1] https://jira.apache.org/jira/browse/FLINK-14063 > [2] https://jira.apache.org/jira/browse/FLINK-15035 > > Best, > Jingsong Lee > > On Tue, Dec 3, 2019 at 6:07 PM Dian Fu <dian0511...@gmail.com> wrote: > >> Hi Jingsong, >> >> Thanks for your valuable feedback. I have updated the "Example" section >> describing how to use these options in a Python Table API program. >> >> Thanks, >> Dian >> >>> 在 2019年12月2日,下午6:12,Jingsong Lee <lzljs3620...@apache.org> 写道: >>> >>> Hi Dian: >>> >>> Thanks for you explanation. >>> If you can update the document to add explanation for the changes to the >>> table layer, >>> it might be better. (it's just a suggestion, it depends on you) >>> About forwardedInputQueue in AbstractPythonScalarFunctionOperator, >>> Will this queue take up a lot of memory? >>> Can it also occupy memory as large as buffer.memory? >>> If so, what we're dealing with now is the silent use of heap memory? >>> I feel a little strange, because the memory on the python side will >> reserve, >>> but the memory on the JVM side is used silently. >>> >>> After carefully seeing your comments on Google doc: >>>> The memory used by the Java operator is currently accounted as the task >>> on-heap memory. We can revisit this if we find it's a problem in the >> future. >>> I agree that we can ignore it now, But we can add some content to the >>> document to remind the user, What do you think? >>> >>> Best, >>> Jingsong Lee >>> >>> On Mon, Dec 2, 2019 at 5:17 PM Dian Fu <dian0511...@gmail.com> wrote: >>> >>>> Hi Jingsong, >>>> >>>> Thanks a lot for your comments. Please see my reply inlined below. >>>> >>>>> 在 2019年12月2日,下午3:47,Jingsong Lee <lzljs3620...@apache.org> 写道: >>>>> >>>>> Hi Dian: >>>>> >>>>> >>>>> Thanks for your driving. I have some questions: >>>>> >>>>> >>>>> - Where should these configurations belong? You have mentioned >>>> tableApi/SQL, >>>>> so should in TableConfig? >>>> >>>> All Python related configurations are defined in PythonOptions. User >> could >>>> configure these configurations via TableConfig.getConfiguration.setXXX >> for >>>> Python Table API programs. >>>> >>>>> >>>>> - If just in table/sql, whether it should be called: table.python.****, >>>>> because in table, all config options are called table.***. >>>> >>>> These configurations are not table specific. They will be used for both >>>> Python Table API programs and Python DataStream API programs (which is >>>> planned to be supported in the future). So python.xxx seems more >>>> appropriate, what do you think? >>>> >>>>> - What should table module do? So in CommonPythonCalc, we should read >>>>> options from table config, and set resources to OneInputTransformation? >>>> >>>> As described in the design doc, in compilation phase, for batch jobs, >> the >>>> required memory of the Python worker will be calculated according to the >>>> configuration and set as the managed memory for the operator. For stream >>>> jobs, the resource spec will be unknown(The reason is that currently the >>>> resources for all the operators in stream jobs are unknown and it >> doesn’t >>>> support to configure both known and unknown resources in a single job). >>>> >>>>> - Are all buffer.memory off-heap memory? I took a look >>>>> to AbstractPythonScalarFunctionOperator, there is a >> forwardedInputQueue, >>>> is >>>>> this one a heap queue? So we need heap memory too? >>>> >>>> Yes, they are all off-heap memory which is supposed to be used by the >>>> Python process. The forwardedInputQueue is a buffer used in the Java >>>> operator and its memory is accounted as the on-heap memory. >>>> >>>> Regards, >>>> Dian >>>> >>>>> >>>>> Hope to get your reply. >>>>> >>>>> >>>>> Best, >>>>> >>>>> Jingsong Lee >>>>> >>>>> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu <dian0511...@gmail.com> >> wrote: >>>>> >>>>>> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu >>>>>> offline and also on the design doc. >>>>>> >>>>>> It seems that we have reached consensus on the design. I would bring >> up >>>>>> the VOTE if there is no other feedbacks. >>>>>> >>>>>> Thanks, >>>>>> Dian >>>>>> >>>>>>> 在 2019年11月22日,下午2:51,Hequn Cheng <chenghe...@gmail.com> 写道: >>>>>>> >>>>>>> Thanks a lot for putting this together, Dian! Definitely +1 for this! >>>>>>> It is great to make sure that the resources used by the Python >> process >>>>>> are >>>>>>> managed properly by Flink’s resource management framework. >>>>>>> >>>>>>> Also, thanks to the guys that working on the unified memory >> management >>>>>>> framework. >>>>>>> >>>>>>> Best, Hequn >>>>>>> >>>>>>> >>>>>>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo <karma...@gmail.com> >> wrote: >>>>>>> >>>>>>>> Thanks for driving this discussion, Dian! >>>>>>>> >>>>>>>> +1 for this proposal. It will help to reduce container failure due >> to >>>>>>>> the memory overuse. >>>>>>>> Some comments left in the design doc. >>>>>>>> >>>>>>>> Best, >>>>>>>> Yangze Guo >>>>>>>> >>>>>>>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song <tonysong...@gmail.com >>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Sorry for the late reply. >>>>>>>>> >>>>>>>>> +1 for the general proposal. >>>>>>>>> >>>>>>>>> And one remainder, to use UNKNOWN resource requirement, we need to >>>> make >>>>>>>>> sure optimizer knowns which operators use off-heap managed memory, >>>> and >>>>>>>>> compute and set a fraction to the operators. See FLIP-53[1] for >> more >>>>>>>>> details, and I would suggest you to double check with @Zhu Zhu who >>>>>> works >>>>>>>> on >>>>>>>>> this part. >>>>>>>>> >>>>>>>>> Thank you~ >>>>>>>>> >>>>>>>>> Xintong Song >>>>>>>>> >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> >>>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management >>>>>>>>> >>>>>>>>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu <dian0511...@gmail.com> >>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Jincheng, >>>>>>>>>> >>>>>>>>>> Thanks for the reply and also looking forward to the feedback from >>>> the >>>>>>>>>> community. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Dian >>>>>>>>>> >>>>>>>>>>> 在 2019年11月11日,下午2:34,jincheng sun <sunjincheng...@gmail.com> 写道: >>>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> +1, Thanks for bring up this discussion Dian! >>>>>>>>>>> >>>>>>>>>>> The Resource Management is very important for PyFlink UDF. So, >> It's >>>>>>>> great >>>>>>>>>>> if anyone can add more comments or inputs in the design doc or >>>>>>>> feedback >>>>>>>>>> in >>>>>>>>>>> ML. :) >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Jincheng >>>>>>>>>>> >>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年11月5日周二 上午11:32写道: >>>>>>>>>>> >>>>>>>>>>>> Hi everyone, >>>>>>>>>>>> >>>>>>>>>>>> In FLIP-58[1] it will add the support of Python user-defined >>>>>>>> stateless >>>>>>>>>>>> function for Python Table API. It will launch a separate Python >>>>>>>> process >>>>>>>>>> for >>>>>>>>>>>> Python user-defined function execution. The resources used by >> the >>>>>>>> Python >>>>>>>>>>>> process should be managed properly by Flink’s resource >> management >>>>>>>>>>>> framework. FLIP-49[2] has proposed a unified memory management >>>>>>>> framework >>>>>>>>>>>> and PyFlink user-defined function resource management should be >>>>>>>> based on >>>>>>>>>>>> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline >> about >>>>>>>>>> this. I >>>>>>>>>>>> draft a design doc[3] and want to start a discussion about >> PyFlink >>>>>>>>>>>> user-defined function resource management. >>>>>>>>>>>> >>>>>>>>>>>> Welcome any comments on the design doc or giving us feedback on >>>> the >>>>>>>> ML >>>>>>>>>>>> directly. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Dian >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table >>>>>>>>>>>> [2] >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors >>>>>>>>>>>> [3] >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Best, Jingsong Lee >>>> >>>> >>> >>> -- >>> Best, Jingsong Lee >> >> > > -- > Best, Jingsong Lee