Hi Jingsong, Thanks for your valuable feedback. I have updated the "Example" section describing how to use these options in a Python Table API program.
Thanks, Dian > 在 2019年12月2日,下午6:12,Jingsong Lee <lzljs3620...@apache.org> 写道: > > Hi Dian: > > Thanks for you explanation. > If you can update the document to add explanation for the changes to the > table layer, > it might be better. (it's just a suggestion, it depends on you) > About forwardedInputQueue in AbstractPythonScalarFunctionOperator, > Will this queue take up a lot of memory? > Can it also occupy memory as large as buffer.memory? > If so, what we're dealing with now is the silent use of heap memory? > I feel a little strange, because the memory on the python side will reserve, > but the memory on the JVM side is used silently. > > After carefully seeing your comments on Google doc: >> The memory used by the Java operator is currently accounted as the task > on-heap memory. We can revisit this if we find it's a problem in the future. > I agree that we can ignore it now, But we can add some content to the > document to remind the user, What do you think? > > Best, > Jingsong Lee > > On Mon, Dec 2, 2019 at 5:17 PM Dian Fu <dian0511...@gmail.com> wrote: > >> Hi Jingsong, >> >> Thanks a lot for your comments. Please see my reply inlined below. >> >>> 在 2019年12月2日,下午3:47,Jingsong Lee <lzljs3620...@apache.org> 写道: >>> >>> Hi Dian: >>> >>> >>> Thanks for your driving. I have some questions: >>> >>> >>> - Where should these configurations belong? You have mentioned >> tableApi/SQL, >>> so should in TableConfig? >> >> All Python related configurations are defined in PythonOptions. User could >> configure these configurations via TableConfig.getConfiguration.setXXX for >> Python Table API programs. >> >>> >>> - If just in table/sql, whether it should be called: table.python.****, >>> because in table, all config options are called table.***. >> >> These configurations are not table specific. They will be used for both >> Python Table API programs and Python DataStream API programs (which is >> planned to be supported in the future). So python.xxx seems more >> appropriate, what do you think? >> >>> - What should table module do? So in CommonPythonCalc, we should read >>> options from table config, and set resources to OneInputTransformation? >> >> As described in the design doc, in compilation phase, for batch jobs, the >> required memory of the Python worker will be calculated according to the >> configuration and set as the managed memory for the operator. For stream >> jobs, the resource spec will be unknown(The reason is that currently the >> resources for all the operators in stream jobs are unknown and it doesn’t >> support to configure both known and unknown resources in a single job). >> >>> - Are all buffer.memory off-heap memory? I took a look >>> to AbstractPythonScalarFunctionOperator, there is a forwardedInputQueue, >> is >>> this one a heap queue? So we need heap memory too? >> >> Yes, they are all off-heap memory which is supposed to be used by the >> Python process. The forwardedInputQueue is a buffer used in the Java >> operator and its memory is accounted as the on-heap memory. >> >> Regards, >> Dian >> >>> >>> Hope to get your reply. >>> >>> >>> Best, >>> >>> Jingsong Lee >>> >>> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu <dian0511...@gmail.com> wrote: >>> >>>> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu >>>> offline and also on the design doc. >>>> >>>> It seems that we have reached consensus on the design. I would bring up >>>> the VOTE if there is no other feedbacks. >>>> >>>> Thanks, >>>> Dian >>>> >>>>> 在 2019年11月22日,下午2:51,Hequn Cheng <chenghe...@gmail.com> 写道: >>>>> >>>>> Thanks a lot for putting this together, Dian! Definitely +1 for this! >>>>> It is great to make sure that the resources used by the Python process >>>> are >>>>> managed properly by Flink’s resource management framework. >>>>> >>>>> Also, thanks to the guys that working on the unified memory management >>>>> framework. >>>>> >>>>> Best, Hequn >>>>> >>>>> >>>>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo <karma...@gmail.com> wrote: >>>>> >>>>>> Thanks for driving this discussion, Dian! >>>>>> >>>>>> +1 for this proposal. It will help to reduce container failure due to >>>>>> the memory overuse. >>>>>> Some comments left in the design doc. >>>>>> >>>>>> Best, >>>>>> Yangze Guo >>>>>> >>>>>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song <tonysong...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> Sorry for the late reply. >>>>>>> >>>>>>> +1 for the general proposal. >>>>>>> >>>>>>> And one remainder, to use UNKNOWN resource requirement, we need to >> make >>>>>>> sure optimizer knowns which operators use off-heap managed memory, >> and >>>>>>> compute and set a fraction to the operators. See FLIP-53[1] for more >>>>>>> details, and I would suggest you to double check with @Zhu Zhu who >>>> works >>>>>> on >>>>>>> this part. >>>>>>> >>>>>>> Thank you~ >>>>>>> >>>>>>> Xintong Song >>>>>>> >>>>>>> >>>>>>> [1] >>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management >>>>>>> >>>>>>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu <dian0511...@gmail.com> >>>> wrote: >>>>>>> >>>>>>>> Hi Jincheng, >>>>>>>> >>>>>>>> Thanks for the reply and also looking forward to the feedback from >> the >>>>>>>> community. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Dian >>>>>>>> >>>>>>>>> 在 2019年11月11日,下午2:34,jincheng sun <sunjincheng...@gmail.com> 写道: >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> +1, Thanks for bring up this discussion Dian! >>>>>>>>> >>>>>>>>> The Resource Management is very important for PyFlink UDF. So, It's >>>>>> great >>>>>>>>> if anyone can add more comments or inputs in the design doc or >>>>>> feedback >>>>>>>> in >>>>>>>>> ML. :) >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Jincheng >>>>>>>>> >>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年11月5日周二 上午11:32写道: >>>>>>>>> >>>>>>>>>> Hi everyone, >>>>>>>>>> >>>>>>>>>> In FLIP-58[1] it will add the support of Python user-defined >>>>>> stateless >>>>>>>>>> function for Python Table API. It will launch a separate Python >>>>>> process >>>>>>>> for >>>>>>>>>> Python user-defined function execution. The resources used by the >>>>>> Python >>>>>>>>>> process should be managed properly by Flink’s resource management >>>>>>>>>> framework. FLIP-49[2] has proposed a unified memory management >>>>>> framework >>>>>>>>>> and PyFlink user-defined function resource management should be >>>>>> based on >>>>>>>>>> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline about >>>>>>>> this. I >>>>>>>>>> draft a design doc[3] and want to start a discussion about PyFlink >>>>>>>>>> user-defined function resource management. >>>>>>>>>> >>>>>>>>>> Welcome any comments on the design doc or giving us feedback on >> the >>>>>> ML >>>>>>>>>> directly. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Dian >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table >>>>>>>>>> [2] >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors >>>>>>>>>> [3] >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m >>>>>>>> >>>>>>>> >>>>>> >>>> >>>> >>> >>> -- >>> Best, Jingsong Lee >> >> > > -- > Best, Jingsong Lee