Hi all,

I have updated the FLIP and removed content relate to UDAF and also changed the 
title of the FLIP to "Flink Python User-Defined Stateless Function for Table". 
Does it make sense to you? 

Regards,
Dian

> 在 2019年9月6日,下午6:09,Dian Fu <dian0511...@gmail.com> 写道:
> 
> Hi all,
> 
> Thanks a lot for the discussion here. It makes sense to limit the scope of 
> this FLIP to only ScalarFunction. I'll update the FLIP and remove the content 
> relating to UDAF.
> 
> Thanks,
> Dian
> 
>> 在 2019年9月6日,下午6:02,jincheng sun <sunjincheng...@gmail.com> 写道:
>> 
>> Hi,
>> 
>> Sure, for ensure the 1.10 relesae of flink, let's split the FLIPs, and
>> FLIP-58 only do the stateless part.
>> 
>> Cheers,
>> Jincheng
>> 
>> Aljoscha Krettek <aljos...@apache.org> 于2019年9月6日周五 下午5:53写道:
>> 
>>> Hi,
>>> 
>>> Regarding stateful functions and MapView/DataView/ListView: I think it’s
>>> best to keep that for a later FLIP and focus on a more basic version.
>>> Supporting stateful functions, especially with MapView can potentially be
>>> very slow so we have to see what we can do there.
>>> 
>>> For the method names, I don’t know. If FLIP-64 passes they have to be
>>> changed. So we could use the final names right away, but I’m also fine with
>>> using the old method names for now.
>>> 
>>> Best,
>>> Aljoscha
>>> 
>>>> On 5. Sep 2019, at 12:40, jincheng sun <sunjincheng...@gmail.com> wrote:
>>>> 
>>>> Hi Aljoscha,
>>>> 
>>>> Thanks for your comments!
>>>> 
>>>> Regarding to the FLIP scope, it seems that we have agreed on the design
>>> of
>>>> the stateless function support.
>>>> What do you think about starting the development of the stateless
>>> function
>>>> support firstly and continue the discussion of stateful function support?
>>>> Or you think we should split the current FLIP into two FLIPs and discuss
>>>> the stateful function support in another thread?
>>>> 
>>>> Currently, the Python DataView/MapView/ListView interfaces design follow
>>>> the Java/Scala naming conversions.
>>>> Of couse, We can continue to discuss whether there are better solutions,
>>>> i.e. using annotations.
>>>> 
>>>> Regarding to the magic logic to support DataView/MapView/ListView, it
>>> will
>>>> be done by the framework and is transparent for users.
>>>> Per my understanding, the magic logic is unavoidable no matter what the
>>>> interfaces will be.
>>>> 
>>>> Regarding to the catalog support of python function:1) If it's stored in
>>>> memory as temporary object, just as you said, users can call
>>>> TableEnvironment.register_function(will change to
>>>> register_temporary_function in FLIP-64)
>>>> 2) If it's persisted in external storage, users can call
>>>> Catalog.create_function. There will be no API change per my
>>> understanding.
>>>> 
>>>> What do you think?
>>>> Best,Jincheng
>>>> 
>>>> Aljoscha Krettek <aljos...@apache.org> 于2019年9月5日周四 下午5:32写道:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Another thing to consider is the Scope of the FLIP. Currently, we try to
>>>>> support (stateful) AggregateFunctions. I have some concerns about
>>> whether
>>>>> or not DataView/MapView/ListView is a good interface because it requires
>>>>> quite some magic from the runners to make it work, such as messing with
>>> the
>>>>> TypeInformation and injecting objects at runtime. If the FLIP aims for
>>> the
>>>>> minimum of ScalarFunctions and the whole execution harness, that should
>>> be
>>>>> easier to agree on.
>>>>> 
>>>>> Another point is the naming of the new methods. I think Timo hinted at
>>> the
>>>>> fact that we have to consider catalog support for functions. There is
>>>>> ongoing work about differentiating between temporary objects and objects
>>>>> that are stored in a catalog (FLIP-64 [1]). With this in mind, the
>>> method
>>>>> for registering functions should be called register_temporary_function()
>>>>> and so on. Unless we want to already think about mixing Python and Java
>>>>> functions in the catalog, which is outside the scope of this FLIP, I
>>> think.
>>>>> 
>>>>> Best,
>>>>> Aljoscha
>>>>> 
>>>>> [1]
>>>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>>>>> 
>>>>> 
>>>>>> On 5. Sep 2019, at 05:01, jincheng sun <sunjincheng...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>> Hi Aljoscha,
>>>>>> 
>>>>>> That's a good points, so far, most of the code will live in
>>> flink-python
>>>>>> module, and the rules and relNodes will be put into the both blink and
>>>>>> flink planner modules, some of the common interface of required by
>>>>> planners
>>>>>> will be placed in flink-table-common. I think you are right, we should
>>>>> try
>>>>>> to ensure the changes of this feature is minimal.  For more detail we
>>>>> would
>>>>>> follow this principle when review the PRs.
>>>>>> 
>>>>>> Great thanks for your questions and remind!
>>>>>> 
>>>>>> Best,
>>>>>> Jincheng
>>>>>> 
>>>>>> 
>>>>>> Aljoscha Krettek <aljos...@apache.org> 于2019年9月4日周三 下午8:58写道:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Things looks interesting so far!
>>>>>>> 
>>>>>>> I had one question: Where will most of the support code for this live?
>>>>>>> Will this add the required code to flink-table-common or the different
>>>>>>> runners? Can we implement this in such a way that only a minimal
>>> amount
>>>>> of
>>>>>>> support code is required in the parts of the Table API (and Table API
>>>>>>> runners) that  are not python specific?
>>>>>>> 
>>>>>>> Best,
>>>>>>> Aljoscha
>>>>>>> 
>>>>>>>> On 4. Sep 2019, at 14:14, Timo Walther <twal...@apache.org> wrote:
>>>>>>>> 
>>>>>>>> Hi Jincheng,
>>>>>>>> 
>>>>>>>> 2. Serializability of functions: "#2 is very convenient for users"
>>>>> means
>>>>>>> only until they have the first backwards-compatibility issue, after
>>> that
>>>>>>> they will find it not so convinient anymore and will ask why the
>>>>> framework
>>>>>>> allowed storing such objects in a persistent storage. I don't want to
>>> be
>>>>>>> picky about it, but wanted to raise awareness that sometimes it is ok
>>> to
>>>>>>> limit use cases to guide users for devloping backwards-compatible
>>>>> programs.
>>>>>>>> 
>>>>>>>> Thanks for the explanation fo the remaining items. It sounds
>>> reasonable
>>>>>>> to me. Regarding the example with `getKind()`, I actually meant
>>>>>>> `org.apache.flink.table.functions.ScalarFunction#getKind` we don't
>>> allow
>>>>>>> users to override this property. And I think we should do something
>>>>> similar
>>>>>>> for the getLanguage property.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Timo
>>>>>>>> 
>>>>>>>> On 03.09.19 15:01, jincheng sun wrote:
>>>>>>>>> Hi Timo,
>>>>>>>>> 
>>>>>>>>> Thanks for the quick reply ! :)
>>>>>>>>> I have added more example for #3 and #5 to the FLIP. That are great
>>>>>>>>> suggestions !
>>>>>>>>> 
>>>>>>>>> Regarding 2:
>>>>>>>>> 
>>>>>>>>> There are two kind Serialization for CloudPickle(Which is different
>>>>> from
>>>>>>>>> Java):
>>>>>>>>> 1) For class and function which can be imported, CloudPickle only
>>>>>>>>> serialize the full path of the class and function (just like java
>>>>> class
>>>>>>>>> name).
>>>>>>>>> 2) For the class and function which can not be imported, CloudPickle
>>>>>>> will
>>>>>>>>> serialize the full content of the class and function.
>>>>>>>>> For #2, It means that we can not just store the full path of the
>>> class
>>>>>>> and
>>>>>>>>> function.
>>>>>>>>> 
>>>>>>>>> The above serialization is recursive.
>>>>>>>>> 
>>>>>>>>> However, there is indeed an problem of backwards compatibility when
>>>>> the
>>>>>>>>> module path of the parent class changed. But I think this is an rare
>>>>>>> case
>>>>>>>>> and acceptable. i.e., For Flink framework we never change the user
>>>>>>>>> interface module path if we want to keep backwards compatibility.
>>> For
>>>>>>> user
>>>>>>>>> code, if they change the interface of UDF's parent, they should
>>>>>>> re-register
>>>>>>>>> their functions.
>>>>>>>>> 
>>>>>>>>> If we do not want support #2, we can store the full path of class
>>> and
>>>>>>>>> function, in that case we have no backwards compatibility problem.
>>>>> But I
>>>>>>>>> think the #2 is very convenient for users.
>>>>>>>>> 
>>>>>>>>> What do you think?
>>>>>>>>> 
>>>>>>>>> Regarding 4:
>>>>>>>>> As I mentioned earlier, there may be built-in Python functions and I
>>>>>>> think
>>>>>>>>> language is a "function" concept. Function and Language are
>>> orthogonal
>>>>>>>>> concepts.
>>>>>>>>> We may have R, GO and other language functions in the future, not
>>> only
>>>>>>>>> user-defined, but also built-in functions.
>>>>>>>>> 
>>>>>>>>> You are right that users will not set this method and for Python
>>>>>>> functions,
>>>>>>>>> it will be set in the code-generated Java function by the framework.
>>>>>>> So, I
>>>>>>>>> think we should declare the getLanguage() in FunctionDefinition for
>>>>> now.
>>>>>>>>> (I'm not pretty sure what do you mean by saying that getKind() is
>>>>> final
>>>>>>> in
>>>>>>>>> UserDefinedFunction?)
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Jincheng
>>>>>>>>> 
>>>>>>>>> Timo Walther <twal...@apache.org> 于2019年9月3日周二 下午6:01写道:
>>>>>>>>> 
>>>>>>>>>> Hi Jincheng,
>>>>>>>>>> 
>>>>>>>>>> thanks for your response.
>>>>>>>>>> 
>>>>>>>>>> 2. Serializability of functions: Using some arbitrary serialization
>>>>>>>>>> format for shipping a function to worker sounds fine to me. But
>>> once
>>>>> we
>>>>>>>>>> store functions a the catalog we need to think about backwards
>>>>>>>>>> compatibility and evolution of interfaces etc. I'm not sure if
>>>>>>>>>> CloudPickle is the right long-term storage format for this. If we
>>>>> don't
>>>>>>>>>> think about this in advance, we are basically violating our code
>>>>>>> quality
>>>>>>>>>> guide [1] of never use Java Serialization but in the Python-way. We
>>>>> are
>>>>>>>>>> using the RPC serialization for persistence.
>>>>>>>>>> 
>>>>>>>>>> 3. TableEnvironment: Can you add some example to the FLIP? Because
>>>>> API
>>>>>>>>>> code like the following is not covered there:
>>>>>>>>>> 
>>>>>>>>>> self.t_env.register_function("add_one", udf(lambda i: i + 1,
>>>>>>>>>> DataTypes.BIGINT(),
>>>>>>>>>>                                          DataTypes.BIGINT()))
>>>>>>>>>> self.t_env.register_function("subtract_one", udf(SubtractOne(),
>>>>>>>>>> DataTypes.BIGINT(),
>>>>>>>>>> DataTypes.BIGINT()))
>>>>>>>>>> self.t_env.register_function("add", add)
>>>>>>>>>> 
>>>>>>>>>> 4. FunctionDefinition: Your response still doesn't answer my
>>> question
>>>>>>>>>> entirely. Why do we need FunctionDefinition.getLanguage() if this
>>> is
>>>>> a
>>>>>>>>>> "user-defined function" concept and not a "function" concept. In
>>> any
>>>>>>>>>> case, all users should not be able to set this method. So it must
>>> be
>>>>>>>>>> final in UserDefinedFunction similar to getKind().
>>>>>>>>>> 
>>>>>>>>>> 5. Function characteristics: If UserDefinedFunction is defined in
>>>>>>>>>> Python, why is it not used in your example in FLIP-58. You could
>>> you
>>>>>>>>>> extend the example to show how to specify these attributes in the
>>>>> FLIP?
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Timo
>>>>>>>>>> 
>>>>>>>>>> [1]
>>>>>>> 
>>> https://flink.apache.org/contributing/code-style-and-quality-java.html
>>>>>>>>>> 
>>>>>>>>>> On 02.09.19 15:35, jincheng sun wrote:
>>>>>>>>>>> Hi Timo,
>>>>>>>>>>> 
>>>>>>>>>>> Great thanks for your feedback. I would like to share my thoughts
>>>>> with
>>>>>>>>>> you
>>>>>>>>>>> inline. :)
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Jincheng
>>>>>>>>>>> 
>>>>>>>>>>> Timo Walther <twal...@apache.org> 于2019年9月2日周一 下午5:04写道:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> the FLIP looks awesome. However, I would like to discuss the
>>>>> changes
>>>>>>> to
>>>>>>>>>>>> the user-facing parts again. Some feedback:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. DataViews: With the current non-annotation design for
>>> DataViews,
>>>>>>> we
>>>>>>>>>>>> cannot perform eager state declaration, right? At which point
>>>>> during
>>>>>>>>>>>> execution do we know which state is required by the function? We
>>>>>>> need to
>>>>>>>>>>>> instantiate the function first, right?
>>>>>>>>>>>> 
>>>>>>>>>>>>> We will analysis the Python AggregateFunction and extract the
>>>>>>> DataViews
>>>>>>>>>>> used in the Python AggregateFunction. This can be done
>>>>>>>>>>> by instantiate a Python AggregateFunction, creating an accumulator
>>>>> by
>>>>>>>>>>> calling method create_accumulator and then analysis the created
>>>>>>>>>>> accumulator. This is actually similar to the way that Java
>>>>>>>>>>> AggregateFunction processing codegen logic. The extracted
>>> DataViews
>>>>>>> can
>>>>>>>>>>> then be used to construct the StateDescriptors in the operator,
>>>>> i.e.,
>>>>>>> we
>>>>>>>>>>> should have hold the state spec and the state descriptor id in
>>> Java
>>>>>>>>>>> operator and Python worker can access the state by specifying the
>>>>>>>>>>> corresponding state descriptor id.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 2. Serializability of functions: How do we ensure serializability
>>>>> of
>>>>>>>>>>>> functions for catalog persistence? In the Scala/Java API, we
>>> would
>>>>>>> like
>>>>>>>>>>>> to register classes instead of instances soon. This is the only
>>> way
>>>>>>> to
>>>>>>>>>>>> store a function properly in a catalog or we need some
>>>>>>>>>>>> serialization/deserialization logic in the function interfaces to
>>>>>>>>>>>> convert an instance to string properties.
>>>>>>>>>>>> 
>>>>>>>>>>>>> The Python function will be serialized with CloudPickle anyway
>>> in
>>>>>>> the
>>>>>>>>>>> Python API as we need to transfer it to the Python worker which
>>> can
>>>>>>> then
>>>>>>>>>>> deserialize it for execution. The serialized Python function can
>>> be
>>>>>>>>>> stored
>>>>>>>>>>> into catalog.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 3. TableEnvironment: What is the signature of
>>>>>>> `register_function(self,
>>>>>>>>>>>> name, function)`? Does it accept both a class and function? Like
>>>>>>> `class
>>>>>>>>>>>> Sum` and `def split()`? Could you add some examples for
>>> registering
>>>>>>> both
>>>>>>>>>>>> kinds of functions?
>>>>>>>>>>>> 
>>>>>>>>>>>>> It has been already supported which you mentioned. You can find
>>> an
>>>>>>>>>>> example in the POC code:
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://github.com/dianfu/flink/commit/93f41ba173482226af7513fdec5acba72b274489#diff-34f619b31a7e38604e22a42a441fbe2fR26
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 4. FunctionDefinition: Function definition is not a user-defined
>>>>>>>>>>>> function definition. It is the highest interface for both
>>>>>>> user-defined
>>>>>>>>>>>> and built-in functions. I'm not sure if getLanguage() should be
>>>>> part
>>>>>>> of
>>>>>>>>>>>> this interface or one-level down which would be
>>>>>>> `UserDefinedFunction`.
>>>>>>>>>>>> Built-in functions will never be implemented in a different
>>>>>>> language. In
>>>>>>>>>>>> any case, I would vote for removing the UNKNOWN language, because
>>>>> it
>>>>>>>>>>>> does not solve anything. Why should a user declare a function
>>> that
>>>>>>> the
>>>>>>>>>>>> runtime can not handle? I also find the term `JAVA` confusing for
>>>>>>> Scala
>>>>>>>>>>>> users. How about `FunctionLanguage.JVM` instead?
>>>>>>>>>>>> 
>>>>>>>>>>>>> Actually we may have built-in Python functions in the future.
>>>>>>> Regarding
>>>>>>>>>>> to the following expression: py_udf1(a, b) + py_udf2(c), if there
>>> is
>>>>>>>>>>> built-in Python
>>>>>>>>>>> funciton for '+' operator, then we don't need to mix using Java
>>> and
>>>>>>>>>> Python
>>>>>>>>>>> UDFs. In this way, we can improve the execution performance.
>>>>>>>>>>> Regarding to removing FunctionLanguage.UNKNOWN and renaming
>>>>>>>>>>> FunctionLanguage.Java to FunctionLanguage.JVM, it makes more sense
>>>>> to
>>>>>>> me.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 5. Function characteristics: In the current design, function
>>>>> classes
>>>>>>> do
>>>>>>>>>>>> not extend from any upper class. How can users declare
>>>>>>> characteristics
>>>>>>>>>>>> that are present in `FunctionDefinition` like determinism,
>>>>>>> requirements,
>>>>>>>>>>>> or soon also monotonism.
>>>>>>>>>>>> 
>>>>>>>>>>>>> Actually we have defined 'UserDefinedFunction' which is the base
>>>>>>> class
>>>>>>>>>>> for all user-defined functions.
>>>>>>>>>>> We can define the deterministic, requirements, etc in this class.
>>>>>>>>>>> Currently, we have already supported to define the deterministic.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Timo
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 02.09.19 03:38, Shaoxuan Wang wrote:
>>>>>>>>>>>>> Hi Jincheng, Fudian, and Aljoscha,
>>>>>>>>>>>>> I am assuming the proposed python UDX can also be applied to
>>> Flink
>>>>>>> SQL.
>>>>>>>>>>>>> Is this correct? If yes, I would suggest to title the FLIP as
>>>>> "Flink
>>>>>>>>>>>> Python
>>>>>>>>>>>>> User-Defined Function" or "Flink Python User-Defined Function
>>> for
>>>>>>>>>> Table".
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Shaoxuan
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Aug 28, 2019 at 12:22 PM jincheng sun <
>>>>>>>>>> sunjincheng...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for the feedback Bowen!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Great thanks for create the FLIP and bring up the VOTE Dian!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best, Jincheng
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月28日周三 上午11:32写道:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I have started a voting thread [1]. Thanks a lot for your help
>>>>>>> during
>>>>>>>>>>>>>>> creating the FLIP @Jincheng.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Bowen,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Very appreciated for your comments. I have replied you in the
>>>>>>> design
>>>>>>>>>>>> doc.
>>>>>>>>>>>>>>> As it seems that the comments doesn't affect the overall
>>> design,
>>>>>>> I'll
>>>>>>>>>>>> not
>>>>>>>>>>>>>>> cancel the vote for now and we can continue the discussion in
>>>>> the
>>>>>>>>>>>> design
>>>>>>>>>>>>>>> doc.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html
>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Dian
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 在 2019年8月28日,上午11:05,Bowen Li <bowenl...@gmail.com> 写道:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Jincheng and Dian,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Sorry for being late to the party. I took a glance at the
>>>>>>> proposal,
>>>>>>>>>>>>>> LGTM
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> general, and I left only a couple comments.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Bowen
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Mon, Aug 26, 2019 at 8:05 PM Dian Fu <
>>> dian0511...@gmail.com
>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> Hi Jincheng,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks! It works.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Dian
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 在 2019年8月27日,上午10:55,jincheng sun <
>>> sunjincheng...@gmail.com>
>>>>>>> 写道:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi Dian, can you check if you have edit access? :)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月26日周一 上午10:52写道:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi Jincheng,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Appreciated for the kind tips and offering of help.
>>>>> Definitely
>>>>>>>>>> need
>>>>>>>>>>>>>>> it!
>>>>>>>>>>>>>>>>>>> Could you grant me write permission for confluence? My Id:
>>>>>>> Dian
>>>>>>>>>> Fu
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Dian
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 在 2019年8月26日,上午9:53,jincheng sun <
>>> sunjincheng...@gmail.com
>>>>>> 
>>>>>>> 写道:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks for your feedback Hequn & Dian.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Dian, I am glad to see that you want help to create the
>>>>> FLIP!
>>>>>>>>>>>>>>>>>>>> Everyone will have first time, and I am very willing to
>>>>> help
>>>>>>> you
>>>>>>>>>>>>>>>>> complete
>>>>>>>>>>>>>>>>>>>> your first FLIP creation. Here some tips:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> - First I'll give your account write permission for
>>>>>>> confluence.
>>>>>>>>>>>>>>>>>>>> - Before create the FLIP, please have look at the FLIP
>>>>>>> Template
>>>>>>>>>>>>>> [1],
>>>>>>>>>>>>>>>>>>> (It's
>>>>>>>>>>>>>>>>>>>> better to know more about FLIP by reading [2])
>>>>>>>>>>>>>>>>>>>> - Create Flink Python UDFs related JIRAs after completing
>>>>> the
>>>>>>>>>> VOTE
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>> FLIP.(I think you also can bring up the VOTE thread, if
>>> you
>>>>>>>>>> want!
>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>>> Any problems you encounter during this period,feel free
>>> to
>>>>>>> tell
>>>>>>>>>> me
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>> can solve them together. :)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Jincheng
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template
>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>>>>>>>>>>>>>>>>>>>> Hequn Cheng <chenghe...@gmail.com> 于2019年8月23日周五
>>>>> 上午11:54写道:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> +1 for starting the vote.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks Jincheng a lot for the discussion.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Best, Hequn
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Fri, Aug 23, 2019 at 10:06 AM Dian Fu <
>>>>>>>>>> dian0511...@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi Jincheng,
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> +1 to start the FLIP create and VOTE on this feature.
>>> I'm
>>>>>>>>>>>> willing
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> help
>>>>>>>>>>>>>>>>>>>>>> on the FLIP create if you don't mind. As I haven't
>>>>> created
>>>>>>> a
>>>>>>>>>>>> FLIP
>>>>>>>>>>>>>>>>>>> before,
>>>>>>>>>>>>>>>>>>>>>> it will be great if you could help on this. :)
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>> Dian
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 在 2019年8月22日,下午11:41,jincheng sun <
>>>>>>> sunjincheng...@gmail.com>
>>>>>>>>>>>>>> 写道:
>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks a lot for your feedback. If there are no more
>>>>>>>>>>>> suggestions
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> comments, I think it's better to  initiate a vote to
>>>>>>> create a
>>>>>>>>>>>>>> FLIP
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>> Apache Flink Python UDFs.
>>>>>>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Best, Jincheng
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> jincheng sun <sunjincheng...@gmail.com> 于2019年8月15日周四
>>>>>>>>>>>>>> 上午12:54写道:
>>>>>>>>>>>>>>>>>>>>>>>> Hi Thomas,
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your confirmation and the very important
>>>>>>> reminder
>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>>>> bundle
>>>>>>>>>>>>>>>>>>>>>>>> processing.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I have had add the description about how to perform
>>>>>>> bundle
>>>>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>> the perspective of checkpoint and watermark. Feel
>>> free
>>>>> to
>>>>>>>>>>>> leave
>>>>>>>>>>>>>>>>>>>>>> comments if
>>>>>>>>>>>>>>>>>>>>>>>> there are anything not describe clearly.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>> Jincheng
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月14日周三
>>>>> 上午10:08写道:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Hi Thomas,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks a lot the suggestions.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Regarding to bundle processing, there is a section
>>>>>>>>>>>>>>> "Checkpoint"[1]
>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> design doc which talks about how to handle the
>>>>>>> checkpoint.
>>>>>>>>>>>>>>>>>>>>>>>>> However, I think you are right that we should talk
>>>>> more
>>>>>>>>>> about
>>>>>>>>>>>>>>> it,
>>>>>>>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>> what's bundle processing, how it affects the
>>>>> checkpoint
>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> watermark,
>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>>> to handle the checkpoint and watermark, etc.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
>>>>>>>>>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>>>>> Dian
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 在 2019年8月14日,上午1:01,Thomas Weise <t...@apache.org>
>>>>> 写道:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jincheng,
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for putting this together. The proposal is
>>>>> very
>>>>>>>>>>>>>>> detailed,
>>>>>>>>>>>>>>>>>>>>>>>>> thorough
>>>>>>>>>>>>>>>>>>>>>>>>>> and for me as a Beam Flink runner contributor easy
>>> to
>>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>>>>>>>>>> One thing that you should probably detail more is
>>> the
>>>>>>>>>> bundle
>>>>>>>>>>>>>>>>>>>>>>>>> processing. It
>>>>>>>>>>>>>>>>>>>>>>>>>> is critically important for performance that
>>> multiple
>>>>>>>>>>>>>> elements
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>>> processed in a bundle. The default bundle size in
>>> the
>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>> runner
>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>> 1s or
>>>>>>>>>>>>>>>>>>>>>>>>>> 1000 elements, whichever comes first. And for
>>>>>>> streaming,
>>>>>>>>>> you
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>> find
>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> logic necessary to align the bundle processing with
>>>>>>>>>>>>>> watermarks
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>> checkpointing here:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Aug 13, 2019 at 7:05 AM jincheng sun <
>>>>>>>>>>>>>>>>>>>>>> sunjincheng...@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> The Python Table API(without Python UDF support)
>>> has
>>>>>>>>>>>> already
>>>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>>>>>>>>>>> supported
>>>>>>>>>>>>>>>>>>>>>>>>>>> and will be available in the coming release 1.9.
>>>>>>>>>>>>>>>>>>>>>>>>>>> As Python UDF is very important for Python users,
>>>>> we'd
>>>>>>>>>> like
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about the Python UDF support in the
>>>>> Python
>>>>>>>>>> Table
>>>>>>>>>>>>>>> API.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek, Dian Fu and I have discussed
>>>>> offline
>>>>>>>>>> and
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>>>>> drafted a
>>>>>>>>>>>>>>>>>>>>>>>>>>> design doc[1]. It includes the following items:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> - The user-defined function interfaces.
>>>>>>>>>>>>>>>>>>>>>>>>>>> - The user-defined function execution
>>> architecture.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> As mentioned by many guys in the previous
>>> discussion
>>>>>>>>>>>>>>> thread[2],
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>>> portability framework was introduced in Apache
>>> Beam
>>>>> in
>>>>>>>>>>>>>> latest
>>>>>>>>>>>>>>>>>>>>>>>>> releases. It
>>>>>>>>>>>>>>>>>>>>>>>>>>> provides well-defined, language-neutral data
>>>>>>> structures
>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> protocols
>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>> language-neutral user-defined function execution.
>>>>> This
>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam's portability framework. We will introduce
>>> how
>>>>> to
>>>>>>>>>> make
>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>> Beam's
>>>>>>>>>>>>>>>>>>>>>>>>>>> portability framework for user-defined function
>>>>>>>>>> execution:
>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>>>>>>>> transmission, state access, checkpoint, metrics,
>>>>>>> logging,
>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Considering that the design relies on Beam's
>>>>>>> portability
>>>>>>>>>>>>>>>>> framework
>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>> Python user-defined function execution and not all
>>>>> the
>>>>>>>>>>>>>>>>>>> contributors
>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink community are familiar with Beam's
>>> portability
>>>>>>>>>>>>>>> framework,
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>>>>>>> done a prototype[3] for proof of concept and also
>>>>>>> ease of
>>>>>>>>>>>>>>>>>>>>>>>>> understanding of
>>>>>>>>>>>>>>>>>>>>>>>>>>> the design.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Welcome any feedback.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Jincheng
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit?usp=sharing
>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-38-Support-python-language-in-flink-TableAPI-td28061.html
>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>> https://github.com/dianfu/flink/commits/udf_poc
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
> 

Reply via email to