Hi, Sure, for ensure the 1.10 relesae of flink, let's split the FLIPs, and FLIP-58 only do the stateless part.
Cheers, Jincheng Aljoscha Krettek <aljos...@apache.org> 于2019年9月6日周五 下午5:53写道: > Hi, > > Regarding stateful functions and MapView/DataView/ListView: I think it’s > best to keep that for a later FLIP and focus on a more basic version. > Supporting stateful functions, especially with MapView can potentially be > very slow so we have to see what we can do there. > > For the method names, I don’t know. If FLIP-64 passes they have to be > changed. So we could use the final names right away, but I’m also fine with > using the old method names for now. > > Best, > Aljoscha > > > On 5. Sep 2019, at 12:40, jincheng sun <sunjincheng...@gmail.com> wrote: > > > > Hi Aljoscha, > > > > Thanks for your comments! > > > > Regarding to the FLIP scope, it seems that we have agreed on the design > of > > the stateless function support. > > What do you think about starting the development of the stateless > function > > support firstly and continue the discussion of stateful function support? > > Or you think we should split the current FLIP into two FLIPs and discuss > > the stateful function support in another thread? > > > > Currently, the Python DataView/MapView/ListView interfaces design follow > > the Java/Scala naming conversions. > > Of couse, We can continue to discuss whether there are better solutions, > > i.e. using annotations. > > > > Regarding to the magic logic to support DataView/MapView/ListView, it > will > > be done by the framework and is transparent for users. > > Per my understanding, the magic logic is unavoidable no matter what the > > interfaces will be. > > > > Regarding to the catalog support of python function:1) If it's stored in > > memory as temporary object, just as you said, users can call > > TableEnvironment.register_function(will change to > > register_temporary_function in FLIP-64) > > 2) If it's persisted in external storage, users can call > > Catalog.create_function. There will be no API change per my > understanding. > > > > What do you think? > > Best,Jincheng > > > > Aljoscha Krettek <aljos...@apache.org> 于2019年9月5日周四 下午5:32写道: > > > >> Hi, > >> > >> Another thing to consider is the Scope of the FLIP. Currently, we try to > >> support (stateful) AggregateFunctions. I have some concerns about > whether > >> or not DataView/MapView/ListView is a good interface because it requires > >> quite some magic from the runners to make it work, such as messing with > the > >> TypeInformation and injecting objects at runtime. If the FLIP aims for > the > >> minimum of ScalarFunctions and the whole execution harness, that should > be > >> easier to agree on. > >> > >> Another point is the naming of the new methods. I think Timo hinted at > the > >> fact that we have to consider catalog support for functions. There is > >> ongoing work about differentiating between temporary objects and objects > >> that are stored in a catalog (FLIP-64 [1]). With this in mind, the > method > >> for registering functions should be called register_temporary_function() > >> and so on. Unless we want to already think about mixing Python and Java > >> functions in the catalog, which is outside the scope of this FLIP, I > think. > >> > >> Best, > >> Aljoscha > >> > >> [1] > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > >> > >> > >>> On 5. Sep 2019, at 05:01, jincheng sun <sunjincheng...@gmail.com> > wrote: > >>> > >>> Hi Aljoscha, > >>> > >>> That's a good points, so far, most of the code will live in > flink-python > >>> module, and the rules and relNodes will be put into the both blink and > >>> flink planner modules, some of the common interface of required by > >> planners > >>> will be placed in flink-table-common. I think you are right, we should > >> try > >>> to ensure the changes of this feature is minimal. For more detail we > >> would > >>> follow this principle when review the PRs. > >>> > >>> Great thanks for your questions and remind! > >>> > >>> Best, > >>> Jincheng > >>> > >>> > >>> Aljoscha Krettek <aljos...@apache.org> 于2019年9月4日周三 下午8:58写道: > >>> > >>>> Hi, > >>>> > >>>> Things looks interesting so far! > >>>> > >>>> I had one question: Where will most of the support code for this live? > >>>> Will this add the required code to flink-table-common or the different > >>>> runners? Can we implement this in such a way that only a minimal > amount > >> of > >>>> support code is required in the parts of the Table API (and Table API > >>>> runners) that are not python specific? > >>>> > >>>> Best, > >>>> Aljoscha > >>>> > >>>>> On 4. Sep 2019, at 14:14, Timo Walther <twal...@apache.org> wrote: > >>>>> > >>>>> Hi Jincheng, > >>>>> > >>>>> 2. Serializability of functions: "#2 is very convenient for users" > >> means > >>>> only until they have the first backwards-compatibility issue, after > that > >>>> they will find it not so convinient anymore and will ask why the > >> framework > >>>> allowed storing such objects in a persistent storage. I don't want to > be > >>>> picky about it, but wanted to raise awareness that sometimes it is ok > to > >>>> limit use cases to guide users for devloping backwards-compatible > >> programs. > >>>>> > >>>>> Thanks for the explanation fo the remaining items. It sounds > reasonable > >>>> to me. Regarding the example with `getKind()`, I actually meant > >>>> `org.apache.flink.table.functions.ScalarFunction#getKind` we don't > allow > >>>> users to override this property. And I think we should do something > >> similar > >>>> for the getLanguage property. > >>>>> > >>>>> Thanks, > >>>>> Timo > >>>>> > >>>>> On 03.09.19 15:01, jincheng sun wrote: > >>>>>> Hi Timo, > >>>>>> > >>>>>> Thanks for the quick reply ! :) > >>>>>> I have added more example for #3 and #5 to the FLIP. That are great > >>>>>> suggestions ! > >>>>>> > >>>>>> Regarding 2: > >>>>>> > >>>>>> There are two kind Serialization for CloudPickle(Which is different > >> from > >>>>>> Java): > >>>>>> 1) For class and function which can be imported, CloudPickle only > >>>>>> serialize the full path of the class and function (just like java > >> class > >>>>>> name). > >>>>>> 2) For the class and function which can not be imported, CloudPickle > >>>> will > >>>>>> serialize the full content of the class and function. > >>>>>> For #2, It means that we can not just store the full path of the > class > >>>> and > >>>>>> function. > >>>>>> > >>>>>> The above serialization is recursive. > >>>>>> > >>>>>> However, there is indeed an problem of backwards compatibility when > >> the > >>>>>> module path of the parent class changed. But I think this is an rare > >>>> case > >>>>>> and acceptable. i.e., For Flink framework we never change the user > >>>>>> interface module path if we want to keep backwards compatibility. > For > >>>> user > >>>>>> code, if they change the interface of UDF's parent, they should > >>>> re-register > >>>>>> their functions. > >>>>>> > >>>>>> If we do not want support #2, we can store the full path of class > and > >>>>>> function, in that case we have no backwards compatibility problem. > >> But I > >>>>>> think the #2 is very convenient for users. > >>>>>> > >>>>>> What do you think? > >>>>>> > >>>>>> Regarding 4: > >>>>>> As I mentioned earlier, there may be built-in Python functions and I > >>>> think > >>>>>> language is a "function" concept. Function and Language are > orthogonal > >>>>>> concepts. > >>>>>> We may have R, GO and other language functions in the future, not > only > >>>>>> user-defined, but also built-in functions. > >>>>>> > >>>>>> You are right that users will not set this method and for Python > >>>> functions, > >>>>>> it will be set in the code-generated Java function by the framework. > >>>> So, I > >>>>>> think we should declare the getLanguage() in FunctionDefinition for > >> now. > >>>>>> (I'm not pretty sure what do you mean by saying that getKind() is > >> final > >>>> in > >>>>>> UserDefinedFunction?) > >>>>>> > >>>>>> Best, > >>>>>> Jincheng > >>>>>> > >>>>>> Timo Walther <twal...@apache.org> 于2019年9月3日周二 下午6:01写道: > >>>>>> > >>>>>>> Hi Jincheng, > >>>>>>> > >>>>>>> thanks for your response. > >>>>>>> > >>>>>>> 2. Serializability of functions: Using some arbitrary serialization > >>>>>>> format for shipping a function to worker sounds fine to me. But > once > >> we > >>>>>>> store functions a the catalog we need to think about backwards > >>>>>>> compatibility and evolution of interfaces etc. I'm not sure if > >>>>>>> CloudPickle is the right long-term storage format for this. If we > >> don't > >>>>>>> think about this in advance, we are basically violating our code > >>>> quality > >>>>>>> guide [1] of never use Java Serialization but in the Python-way. We > >> are > >>>>>>> using the RPC serialization for persistence. > >>>>>>> > >>>>>>> 3. TableEnvironment: Can you add some example to the FLIP? Because > >> API > >>>>>>> code like the following is not covered there: > >>>>>>> > >>>>>>> self.t_env.register_function("add_one", udf(lambda i: i + 1, > >>>>>>> DataTypes.BIGINT(), > >>>>>>> DataTypes.BIGINT())) > >>>>>>> self.t_env.register_function("subtract_one", udf(SubtractOne(), > >>>>>>> DataTypes.BIGINT(), > >>>>>>> DataTypes.BIGINT())) > >>>>>>> self.t_env.register_function("add", add) > >>>>>>> > >>>>>>> 4. FunctionDefinition: Your response still doesn't answer my > question > >>>>>>> entirely. Why do we need FunctionDefinition.getLanguage() if this > is > >> a > >>>>>>> "user-defined function" concept and not a "function" concept. In > any > >>>>>>> case, all users should not be able to set this method. So it must > be > >>>>>>> final in UserDefinedFunction similar to getKind(). > >>>>>>> > >>>>>>> 5. Function characteristics: If UserDefinedFunction is defined in > >>>>>>> Python, why is it not used in your example in FLIP-58. You could > you > >>>>>>> extend the example to show how to specify these attributes in the > >> FLIP? > >>>>>>> > >>>>>>> Regards, > >>>>>>> Timo > >>>>>>> > >>>>>>> [1] > >>>> > https://flink.apache.org/contributing/code-style-and-quality-java.html > >>>>>>> > >>>>>>> On 02.09.19 15:35, jincheng sun wrote: > >>>>>>>> Hi Timo, > >>>>>>>> > >>>>>>>> Great thanks for your feedback. I would like to share my thoughts > >> with > >>>>>>> you > >>>>>>>> inline. :) > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jincheng > >>>>>>>> > >>>>>>>> Timo Walther <twal...@apache.org> 于2019年9月2日周一 下午5:04写道: > >>>>>>>> > >>>>>>>>> Hi all, > >>>>>>>>> > >>>>>>>>> the FLIP looks awesome. However, I would like to discuss the > >> changes > >>>> to > >>>>>>>>> the user-facing parts again. Some feedback: > >>>>>>>>> > >>>>>>>>> 1. DataViews: With the current non-annotation design for > DataViews, > >>>> we > >>>>>>>>> cannot perform eager state declaration, right? At which point > >> during > >>>>>>>>> execution do we know which state is required by the function? We > >>>> need to > >>>>>>>>> instantiate the function first, right? > >>>>>>>>> > >>>>>>>>>> We will analysis the Python AggregateFunction and extract the > >>>> DataViews > >>>>>>>> used in the Python AggregateFunction. This can be done > >>>>>>>> by instantiate a Python AggregateFunction, creating an accumulator > >> by > >>>>>>>> calling method create_accumulator and then analysis the created > >>>>>>>> accumulator. This is actually similar to the way that Java > >>>>>>>> AggregateFunction processing codegen logic. The extracted > DataViews > >>>> can > >>>>>>>> then be used to construct the StateDescriptors in the operator, > >> i.e., > >>>> we > >>>>>>>> should have hold the state spec and the state descriptor id in > Java > >>>>>>>> operator and Python worker can access the state by specifying the > >>>>>>>> corresponding state descriptor id. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> 2. Serializability of functions: How do we ensure serializability > >> of > >>>>>>>>> functions for catalog persistence? In the Scala/Java API, we > would > >>>> like > >>>>>>>>> to register classes instead of instances soon. This is the only > way > >>>> to > >>>>>>>>> store a function properly in a catalog or we need some > >>>>>>>>> serialization/deserialization logic in the function interfaces to > >>>>>>>>> convert an instance to string properties. > >>>>>>>>> > >>>>>>>>>> The Python function will be serialized with CloudPickle anyway > in > >>>> the > >>>>>>>> Python API as we need to transfer it to the Python worker which > can > >>>> then > >>>>>>>> deserialize it for execution. The serialized Python function can > be > >>>>>>> stored > >>>>>>>> into catalog. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> 3. TableEnvironment: What is the signature of > >>>> `register_function(self, > >>>>>>>>> name, function)`? Does it accept both a class and function? Like > >>>> `class > >>>>>>>>> Sum` and `def split()`? Could you add some examples for > registering > >>>> both > >>>>>>>>> kinds of functions? > >>>>>>>>> > >>>>>>>>>> It has been already supported which you mentioned. You can find > an > >>>>>>>> example in the POC code: > >>>>>>>> > >>>>>>> > >>>> > >> > https://github.com/dianfu/flink/commit/93f41ba173482226af7513fdec5acba72b274489#diff-34f619b31a7e38604e22a42a441fbe2fR26 > >>>>>>>> > >>>>>>>> > >>>>>>>>> 4. FunctionDefinition: Function definition is not a user-defined > >>>>>>>>> function definition. It is the highest interface for both > >>>> user-defined > >>>>>>>>> and built-in functions. I'm not sure if getLanguage() should be > >> part > >>>> of > >>>>>>>>> this interface or one-level down which would be > >>>> `UserDefinedFunction`. > >>>>>>>>> Built-in functions will never be implemented in a different > >>>> language. In > >>>>>>>>> any case, I would vote for removing the UNKNOWN language, because > >> it > >>>>>>>>> does not solve anything. Why should a user declare a function > that > >>>> the > >>>>>>>>> runtime can not handle? I also find the term `JAVA` confusing for > >>>> Scala > >>>>>>>>> users. How about `FunctionLanguage.JVM` instead? > >>>>>>>>> > >>>>>>>>>> Actually we may have built-in Python functions in the future. > >>>> Regarding > >>>>>>>> to the following expression: py_udf1(a, b) + py_udf2(c), if there > is > >>>>>>>> built-in Python > >>>>>>>> funciton for '+' operator, then we don't need to mix using Java > and > >>>>>>> Python > >>>>>>>> UDFs. In this way, we can improve the execution performance. > >>>>>>>> Regarding to removing FunctionLanguage.UNKNOWN and renaming > >>>>>>>> FunctionLanguage.Java to FunctionLanguage.JVM, it makes more sense > >> to > >>>> me. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> 5. Function characteristics: In the current design, function > >> classes > >>>> do > >>>>>>>>> not extend from any upper class. How can users declare > >>>> characteristics > >>>>>>>>> that are present in `FunctionDefinition` like determinism, > >>>> requirements, > >>>>>>>>> or soon also monotonism. > >>>>>>>>> > >>>>>>>>>> Actually we have defined 'UserDefinedFunction' which is the base > >>>> class > >>>>>>>> for all user-defined functions. > >>>>>>>> We can define the deterministic, requirements, etc in this class. > >>>>>>>> Currently, we have already supported to define the deterministic. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Timo > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 02.09.19 03:38, Shaoxuan Wang wrote: > >>>>>>>>>> Hi Jincheng, Fudian, and Aljoscha, > >>>>>>>>>> I am assuming the proposed python UDX can also be applied to > Flink > >>>> SQL. > >>>>>>>>>> Is this correct? If yes, I would suggest to title the FLIP as > >> "Flink > >>>>>>>>> Python > >>>>>>>>>> User-Defined Function" or "Flink Python User-Defined Function > for > >>>>>>> Table". > >>>>>>>>>> Regards, > >>>>>>>>>> Shaoxuan > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Wed, Aug 28, 2019 at 12:22 PM jincheng sun < > >>>>>>> sunjincheng...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Thanks for the feedback Bowen! > >>>>>>>>>>> > >>>>>>>>>>> Great thanks for create the FLIP and bring up the VOTE Dian! > >>>>>>>>>>> > >>>>>>>>>>> Best, Jincheng > >>>>>>>>>>> > >>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月28日周三 上午11:32写道: > >>>>>>>>>>> > >>>>>>>>>>>> Hi all, > >>>>>>>>>>>> > >>>>>>>>>>>> I have started a voting thread [1]. Thanks a lot for your help > >>>> during > >>>>>>>>>>>> creating the FLIP @Jincheng. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Hi Bowen, > >>>>>>>>>>>> > >>>>>>>>>>>> Very appreciated for your comments. I have replied you in the > >>>> design > >>>>>>>>> doc. > >>>>>>>>>>>> As it seems that the comments doesn't affect the overall > design, > >>>> I'll > >>>>>>>>> not > >>>>>>>>>>>> cancel the vote for now and we can continue the discussion in > >> the > >>>>>>>>> design > >>>>>>>>>>>> doc. > >>>>>>>>>>>> > >>>>>>>>>>>> [1] > >>>>>>>>>>>> > >>>>>>> > >>>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html > >>>>>>>>>>>> < > >>>>>>>>>>>> > >>>>>>> > >>>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html > >>>>>>>>>>>> Regards, > >>>>>>>>>>>> Dian > >>>>>>>>>>>> > >>>>>>>>>>>>> 在 2019年8月28日,上午11:05,Bowen Li <bowenl...@gmail.com> 写道: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi Jincheng and Dian, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Sorry for being late to the party. I took a glance at the > >>>> proposal, > >>>>>>>>>>> LGTM > >>>>>>>>>>>> in > >>>>>>>>>>>>> general, and I left only a couple comments. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> Bowen > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Mon, Aug 26, 2019 at 8:05 PM Dian Fu < > dian0511...@gmail.com > >>> > >>>>>>>>> wrote: > >>>>>>>>>>>>>> Hi Jincheng, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks! It works. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> Dian > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 在 2019年8月27日,上午10:55,jincheng sun < > sunjincheng...@gmail.com> > >>>> 写道: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi Dian, can you check if you have edit access? :) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月26日周一 上午10:52写道: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi Jincheng, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Appreciated for the kind tips and offering of help. > >> Definitely > >>>>>>> need > >>>>>>>>>>>> it! > >>>>>>>>>>>>>>>> Could you grant me write permission for confluence? My Id: > >>>> Dian > >>>>>>> Fu > >>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>> Dian > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 在 2019年8月26日,上午9:53,jincheng sun < > sunjincheng...@gmail.com > >>> > >>>> 写道: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks for your feedback Hequn & Dian. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Dian, I am glad to see that you want help to create the > >> FLIP! > >>>>>>>>>>>>>>>>> Everyone will have first time, and I am very willing to > >> help > >>>> you > >>>>>>>>>>>>>> complete > >>>>>>>>>>>>>>>>> your first FLIP creation. Here some tips: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> - First I'll give your account write permission for > >>>> confluence. > >>>>>>>>>>>>>>>>> - Before create the FLIP, please have look at the FLIP > >>>> Template > >>>>>>>>>>> [1], > >>>>>>>>>>>>>>>> (It's > >>>>>>>>>>>>>>>>> better to know more about FLIP by reading [2]) > >>>>>>>>>>>>>>>>> - Create Flink Python UDFs related JIRAs after completing > >> the > >>>>>>> VOTE > >>>>>>>>>>> of > >>>>>>>>>>>>>>>>> FLIP.(I think you also can bring up the VOTE thread, if > you > >>>>>>> want! > >>>>>>>>> ) > >>>>>>>>>>>>>>>>> Any problems you encounter during this period,feel free > to > >>>> tell > >>>>>>> me > >>>>>>>>>>>> that > >>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>> can solve them together. :) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>> Jincheng > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> [1] > >>>>>>>>>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template > >>>>>>>>>>>>>>>>> [2] > >>>>>>>>>>>>>>>>> > >>>>>>> > >>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > >>>>>>>>>>>>>>>>> Hequn Cheng <chenghe...@gmail.com> 于2019年8月23日周五 > >> 上午11:54写道: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> +1 for starting the vote. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thanks Jincheng a lot for the discussion. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Best, Hequn > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Fri, Aug 23, 2019 at 10:06 AM Dian Fu < > >>>>>>> dian0511...@gmail.com> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> Hi Jincheng, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> +1 to start the FLIP create and VOTE on this feature. > I'm > >>>>>>>>> willing > >>>>>>>>>>>> to > >>>>>>>>>>>>>>>> help > >>>>>>>>>>>>>>>>>>> on the FLIP create if you don't mind. As I haven't > >> created > >>>> a > >>>>>>>>> FLIP > >>>>>>>>>>>>>>>> before, > >>>>>>>>>>>>>>>>>>> it will be great if you could help on this. :) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>>>> Dian > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> 在 2019年8月22日,下午11:41,jincheng sun < > >>>> sunjincheng...@gmail.com> > >>>>>>>>>>> 写道: > >>>>>>>>>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Thanks a lot for your feedback. If there are no more > >>>>>>>>> suggestions > >>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>> comments, I think it's better to initiate a vote to > >>>> create a > >>>>>>>>>>> FLIP > >>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>> Apache Flink Python UDFs. > >>>>>>>>>>>>>>>>>>>> What do you think? > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Best, Jincheng > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> jincheng sun <sunjincheng...@gmail.com> 于2019年8月15日周四 > >>>>>>>>>>> 上午12:54写道: > >>>>>>>>>>>>>>>>>>>>> Hi Thomas, > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Thanks for your confirmation and the very important > >>>> reminder > >>>>>>>>>>>> about > >>>>>>>>>>>>>>>>>>> bundle > >>>>>>>>>>>>>>>>>>>>> processing. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I have had add the description about how to perform > >>>> bundle > >>>>>>>>>>>>>> processing > >>>>>>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>>>>> the perspective of checkpoint and watermark. Feel > free > >> to > >>>>>>>>> leave > >>>>>>>>>>>>>>>>>>> comments if > >>>>>>>>>>>>>>>>>>>>> there are anything not describe clearly. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>> Jincheng > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月14日周三 > >> 上午10:08写道: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Hi Thomas, > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Thanks a lot the suggestions. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Regarding to bundle processing, there is a section > >>>>>>>>>>>> "Checkpoint"[1] > >>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>> design doc which talks about how to handle the > >>>> checkpoint. > >>>>>>>>>>>>>>>>>>>>>> However, I think you are right that we should talk > >> more > >>>>>>> about > >>>>>>>>>>>> it, > >>>>>>>>>>>>>>>>>> such > >>>>>>>>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>>>>>>> what's bundle processing, how it affects the > >> checkpoint > >>>> and > >>>>>>>>>>>>>>>>>> watermark, > >>>>>>>>>>>>>>>>>>> how > >>>>>>>>>>>>>>>>>>>>>> to handle the checkpoint and watermark, etc. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>> > >>>> > >> > https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3 > >>>>>>>>>>>>>>>>>>>>>> < > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>> > >>>> > >> > https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3 > >>>>>>>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>>>>>>> Dian > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> 在 2019年8月14日,上午1:01,Thomas Weise <t...@apache.org> > >> 写道: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Hi Jincheng, > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Thanks for putting this together. The proposal is > >> very > >>>>>>>>>>>> detailed, > >>>>>>>>>>>>>>>>>>>>>> thorough > >>>>>>>>>>>>>>>>>>>>>>> and for me as a Beam Flink runner contributor easy > to > >>>>>>>>>>>> understand > >>>>>>>>>>>>>> :) > >>>>>>>>>>>>>>>>>>>>>>> One thing that you should probably detail more is > the > >>>>>>> bundle > >>>>>>>>>>>>>>>>>>>>>> processing. It > >>>>>>>>>>>>>>>>>>>>>>> is critically important for performance that > multiple > >>>>>>>>>>> elements > >>>>>>>>>>>>>> are > >>>>>>>>>>>>>>>>>>>>>>> processed in a bundle. The default bundle size in > the > >>>>>>> Flink > >>>>>>>>>>>>>> runner > >>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>> 1s or > >>>>>>>>>>>>>>>>>>>>>>> 1000 elements, whichever comes first. And for > >>>> streaming, > >>>>>>> you > >>>>>>>>>>>> can > >>>>>>>>>>>>>>>>>> find > >>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>> logic necessary to align the bundle processing with > >>>>>>>>>>> watermarks > >>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>> checkpointing here: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>> > >>>> > >> > https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java > >>>>>>>>>>>>>>>>>>>>>>> Thomas > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On Tue, Aug 13, 2019 at 7:05 AM jincheng sun < > >>>>>>>>>>>>>>>>>>> sunjincheng...@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> The Python Table API(without Python UDF support) > has > >>>>>>>>> already > >>>>>>>>>>>>>> been > >>>>>>>>>>>>>>>>>>>>>> supported > >>>>>>>>>>>>>>>>>>>>>>>> and will be available in the coming release 1.9. > >>>>>>>>>>>>>>>>>>>>>>>> As Python UDF is very important for Python users, > >> we'd > >>>>>>> like > >>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> start > >>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>> discussion about the Python UDF support in the > >> Python > >>>>>>> Table > >>>>>>>>>>>> API. > >>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek, Dian Fu and I have discussed > >> offline > >>>>>>> and > >>>>>>>>>>>> have > >>>>>>>>>>>>>>>>>>>>>> drafted a > >>>>>>>>>>>>>>>>>>>>>>>> design doc[1]. It includes the following items: > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> - The user-defined function interfaces. > >>>>>>>>>>>>>>>>>>>>>>>> - The user-defined function execution > architecture. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> As mentioned by many guys in the previous > discussion > >>>>>>>>>>>> thread[2], > >>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>>>>>> portability framework was introduced in Apache > Beam > >> in > >>>>>>>>>>> latest > >>>>>>>>>>>>>>>>>>>>>> releases. It > >>>>>>>>>>>>>>>>>>>>>>>> provides well-defined, language-neutral data > >>>> structures > >>>>>>> and > >>>>>>>>>>>>>>>>>> protocols > >>>>>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>> language-neutral user-defined function execution. > >> This > >>>>>>>>>>> design > >>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>> based > >>>>>>>>>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>>>>>>>>> Beam's portability framework. We will introduce > how > >> to > >>>>>>> make > >>>>>>>>>>>> use > >>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>>>>> Beam's > >>>>>>>>>>>>>>>>>>>>>>>> portability framework for user-defined function > >>>>>>> execution: > >>>>>>>>>>>> data > >>>>>>>>>>>>>>>>>>>>>>>> transmission, state access, checkpoint, metrics, > >>>> logging, > >>>>>>>>>>> etc. > >>>>>>>>>>>>>>>>>>>>>>>> Considering that the design relies on Beam's > >>>> portability > >>>>>>>>>>>>>> framework > >>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>> Python user-defined function execution and not all > >> the > >>>>>>>>>>>>>>>> contributors > >>>>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>>>>>>> Flink community are familiar with Beam's > portability > >>>>>>>>>>>> framework, > >>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>> have > >>>>>>>>>>>>>>>>>>>>>>>> done a prototype[3] for proof of concept and also > >>>> ease of > >>>>>>>>>>>>>>>>>>>>>> understanding of > >>>>>>>>>>>>>>>>>>>>>>>> the design. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Welcome any feedback. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>> Jincheng > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>> > >>>> > >> > https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit?usp=sharing > >>>>>>>>>>>>>>>>>>>>>>>> [2] > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>> > >>>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-38-Support-python-language-in-flink-TableAPI-td28061.html > >>>>>>>>>>>>>>>>>>>>>>>> [3] > https://github.com/dianfu/flink/commits/udf_poc > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>> > >>>>> > >>>> > >>>> > >> > >> > >