Hi,

Sure, for ensure the 1.10 relesae of flink, let's split the FLIPs, and
FLIP-58 only do the stateless part.

Cheers,
Jincheng

Aljoscha Krettek <aljos...@apache.org> 于2019年9月6日周五 下午5:53写道:

> Hi,
>
> Regarding stateful functions and MapView/DataView/ListView: I think it’s
> best to keep that for a later FLIP and focus on a more basic version.
> Supporting stateful functions, especially with MapView can potentially be
> very slow so we have to see what we can do there.
>
> For the method names, I don’t know. If FLIP-64 passes they have to be
> changed. So we could use the final names right away, but I’m also fine with
> using the old method names for now.
>
> Best,
> Aljoscha
>
> > On 5. Sep 2019, at 12:40, jincheng sun <sunjincheng...@gmail.com> wrote:
> >
> > Hi Aljoscha,
> >
> > Thanks for your comments!
> >
> > Regarding to the FLIP scope, it seems that we have agreed on the design
> of
> > the stateless function support.
> > What do you think about starting the development of the stateless
> function
> > support firstly and continue the discussion of stateful function support?
> > Or you think we should split the current FLIP into two FLIPs and discuss
> > the stateful function support in another thread?
> >
> > Currently, the Python DataView/MapView/ListView interfaces design follow
> > the Java/Scala naming conversions.
> > Of couse, We can continue to discuss whether there are better solutions,
> > i.e. using annotations.
> >
> > Regarding to the magic logic to support DataView/MapView/ListView, it
> will
> > be done by the framework and is transparent for users.
> > Per my understanding, the magic logic is unavoidable no matter what the
> > interfaces will be.
> >
> > Regarding to the catalog support of python function:1) If it's stored in
> > memory as temporary object, just as you said, users can call
> > TableEnvironment.register_function(will change to
> > register_temporary_function in FLIP-64)
> > 2) If it's persisted in external storage, users can call
> > Catalog.create_function. There will be no API change per my
> understanding.
> >
> > What do you think?
> > Best,Jincheng
> >
> > Aljoscha Krettek <aljos...@apache.org> 于2019年9月5日周四 下午5:32写道:
> >
> >> Hi,
> >>
> >> Another thing to consider is the Scope of the FLIP. Currently, we try to
> >> support (stateful) AggregateFunctions. I have some concerns about
> whether
> >> or not DataView/MapView/ListView is a good interface because it requires
> >> quite some magic from the runners to make it work, such as messing with
> the
> >> TypeInformation and injecting objects at runtime. If the FLIP aims for
> the
> >> minimum of ScalarFunctions and the whole execution harness, that should
> be
> >> easier to agree on.
> >>
> >> Another point is the naming of the new methods. I think Timo hinted at
> the
> >> fact that we have to consider catalog support for functions. There is
> >> ongoing work about differentiating between temporary objects and objects
> >> that are stored in a catalog (FLIP-64 [1]). With this in mind, the
> method
> >> for registering functions should be called register_temporary_function()
> >> and so on. Unless we want to already think about mixing Python and Java
> >> functions in the catalog, which is outside the scope of this FLIP, I
> think.
> >>
> >> Best,
> >> Aljoscha
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >>
> >>
> >>> On 5. Sep 2019, at 05:01, jincheng sun <sunjincheng...@gmail.com>
> wrote:
> >>>
> >>> Hi Aljoscha,
> >>>
> >>> That's a good points, so far, most of the code will live in
> flink-python
> >>> module, and the rules and relNodes will be put into the both blink and
> >>> flink planner modules, some of the common interface of required by
> >> planners
> >>> will be placed in flink-table-common. I think you are right, we should
> >> try
> >>> to ensure the changes of this feature is minimal.  For more detail we
> >> would
> >>> follow this principle when review the PRs.
> >>>
> >>> Great thanks for your questions and remind!
> >>>
> >>> Best,
> >>> Jincheng
> >>>
> >>>
> >>> Aljoscha Krettek <aljos...@apache.org> 于2019年9月4日周三 下午8:58写道:
> >>>
> >>>> Hi,
> >>>>
> >>>> Things looks interesting so far!
> >>>>
> >>>> I had one question: Where will most of the support code for this live?
> >>>> Will this add the required code to flink-table-common or the different
> >>>> runners? Can we implement this in such a way that only a minimal
> amount
> >> of
> >>>> support code is required in the parts of the Table API (and Table API
> >>>> runners) that  are not python specific?
> >>>>
> >>>> Best,
> >>>> Aljoscha
> >>>>
> >>>>> On 4. Sep 2019, at 14:14, Timo Walther <twal...@apache.org> wrote:
> >>>>>
> >>>>> Hi Jincheng,
> >>>>>
> >>>>> 2. Serializability of functions: "#2 is very convenient for users"
> >> means
> >>>> only until they have the first backwards-compatibility issue, after
> that
> >>>> they will find it not so convinient anymore and will ask why the
> >> framework
> >>>> allowed storing such objects in a persistent storage. I don't want to
> be
> >>>> picky about it, but wanted to raise awareness that sometimes it is ok
> to
> >>>> limit use cases to guide users for devloping backwards-compatible
> >> programs.
> >>>>>
> >>>>> Thanks for the explanation fo the remaining items. It sounds
> reasonable
> >>>> to me. Regarding the example with `getKind()`, I actually meant
> >>>> `org.apache.flink.table.functions.ScalarFunction#getKind` we don't
> allow
> >>>> users to override this property. And I think we should do something
> >> similar
> >>>> for the getLanguage property.
> >>>>>
> >>>>> Thanks,
> >>>>> Timo
> >>>>>
> >>>>> On 03.09.19 15:01, jincheng sun wrote:
> >>>>>> Hi Timo,
> >>>>>>
> >>>>>> Thanks for the quick reply ! :)
> >>>>>> I have added more example for #3 and #5 to the FLIP. That are great
> >>>>>> suggestions !
> >>>>>>
> >>>>>> Regarding 2:
> >>>>>>
> >>>>>> There are two kind Serialization for CloudPickle(Which is different
> >> from
> >>>>>> Java):
> >>>>>> 1) For class and function which can be imported, CloudPickle only
> >>>>>> serialize the full path of the class and function (just like java
> >> class
> >>>>>> name).
> >>>>>> 2) For the class and function which can not be imported, CloudPickle
> >>>> will
> >>>>>> serialize the full content of the class and function.
> >>>>>> For #2, It means that we can not just store the full path of the
> class
> >>>> and
> >>>>>> function.
> >>>>>>
> >>>>>> The above serialization is recursive.
> >>>>>>
> >>>>>> However, there is indeed an problem of backwards compatibility when
> >> the
> >>>>>> module path of the parent class changed. But I think this is an rare
> >>>> case
> >>>>>> and acceptable. i.e., For Flink framework we never change the user
> >>>>>> interface module path if we want to keep backwards compatibility.
> For
> >>>> user
> >>>>>> code, if they change the interface of UDF's parent, they should
> >>>> re-register
> >>>>>> their functions.
> >>>>>>
> >>>>>> If we do not want support #2, we can store the full path of class
> and
> >>>>>> function, in that case we have no backwards compatibility problem.
> >> But I
> >>>>>> think the #2 is very convenient for users.
> >>>>>>
> >>>>>> What do you think?
> >>>>>>
> >>>>>> Regarding 4:
> >>>>>> As I mentioned earlier, there may be built-in Python functions and I
> >>>> think
> >>>>>> language is a "function" concept. Function and Language are
> orthogonal
> >>>>>> concepts.
> >>>>>> We may have R, GO and other language functions in the future, not
> only
> >>>>>> user-defined, but also built-in functions.
> >>>>>>
> >>>>>> You are right that users will not set this method and for Python
> >>>> functions,
> >>>>>> it will be set in the code-generated Java function by the framework.
> >>>> So, I
> >>>>>> think we should declare the getLanguage() in FunctionDefinition for
> >> now.
> >>>>>> (I'm not pretty sure what do you mean by saying that getKind() is
> >> final
> >>>> in
> >>>>>> UserDefinedFunction?)
> >>>>>>
> >>>>>> Best,
> >>>>>> Jincheng
> >>>>>>
> >>>>>> Timo Walther <twal...@apache.org> 于2019年9月3日周二 下午6:01写道:
> >>>>>>
> >>>>>>> Hi Jincheng,
> >>>>>>>
> >>>>>>> thanks for your response.
> >>>>>>>
> >>>>>>> 2. Serializability of functions: Using some arbitrary serialization
> >>>>>>> format for shipping a function to worker sounds fine to me. But
> once
> >> we
> >>>>>>> store functions a the catalog we need to think about backwards
> >>>>>>> compatibility and evolution of interfaces etc. I'm not sure if
> >>>>>>> CloudPickle is the right long-term storage format for this. If we
> >> don't
> >>>>>>> think about this in advance, we are basically violating our code
> >>>> quality
> >>>>>>> guide [1] of never use Java Serialization but in the Python-way. We
> >> are
> >>>>>>> using the RPC serialization for persistence.
> >>>>>>>
> >>>>>>> 3. TableEnvironment: Can you add some example to the FLIP? Because
> >> API
> >>>>>>> code like the following is not covered there:
> >>>>>>>
> >>>>>>> self.t_env.register_function("add_one", udf(lambda i: i + 1,
> >>>>>>> DataTypes.BIGINT(),
> >>>>>>>                                            DataTypes.BIGINT()))
> >>>>>>> self.t_env.register_function("subtract_one", udf(SubtractOne(),
> >>>>>>> DataTypes.BIGINT(),
> >>>>>>> DataTypes.BIGINT()))
> >>>>>>> self.t_env.register_function("add", add)
> >>>>>>>
> >>>>>>> 4. FunctionDefinition: Your response still doesn't answer my
> question
> >>>>>>> entirely. Why do we need FunctionDefinition.getLanguage() if this
> is
> >> a
> >>>>>>> "user-defined function" concept and not a "function" concept. In
> any
> >>>>>>> case, all users should not be able to set this method. So it must
> be
> >>>>>>> final in UserDefinedFunction similar to getKind().
> >>>>>>>
> >>>>>>> 5. Function characteristics: If UserDefinedFunction is defined in
> >>>>>>> Python, why is it not used in your example in FLIP-58. You could
> you
> >>>>>>> extend the example to show how to specify these attributes in the
> >> FLIP?
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Timo
> >>>>>>>
> >>>>>>> [1]
> >>>>
> https://flink.apache.org/contributing/code-style-and-quality-java.html
> >>>>>>>
> >>>>>>> On 02.09.19 15:35, jincheng sun wrote:
> >>>>>>>> Hi Timo,
> >>>>>>>>
> >>>>>>>> Great thanks for your feedback. I would like to share my thoughts
> >> with
> >>>>>>> you
> >>>>>>>> inline. :)
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jincheng
> >>>>>>>>
> >>>>>>>> Timo Walther <twal...@apache.org> 于2019年9月2日周一 下午5:04写道:
> >>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> the FLIP looks awesome. However, I would like to discuss the
> >> changes
> >>>> to
> >>>>>>>>> the user-facing parts again. Some feedback:
> >>>>>>>>>
> >>>>>>>>> 1. DataViews: With the current non-annotation design for
> DataViews,
> >>>> we
> >>>>>>>>> cannot perform eager state declaration, right? At which point
> >> during
> >>>>>>>>> execution do we know which state is required by the function? We
> >>>> need to
> >>>>>>>>> instantiate the function first, right?
> >>>>>>>>>
> >>>>>>>>>> We will analysis the Python AggregateFunction and extract the
> >>>> DataViews
> >>>>>>>> used in the Python AggregateFunction. This can be done
> >>>>>>>> by instantiate a Python AggregateFunction, creating an accumulator
> >> by
> >>>>>>>> calling method create_accumulator and then analysis the created
> >>>>>>>> accumulator. This is actually similar to the way that Java
> >>>>>>>> AggregateFunction processing codegen logic. The extracted
> DataViews
> >>>> can
> >>>>>>>> then be used to construct the StateDescriptors in the operator,
> >> i.e.,
> >>>> we
> >>>>>>>> should have hold the state spec and the state descriptor id in
> Java
> >>>>>>>> operator and Python worker can access the state by specifying the
> >>>>>>>> corresponding state descriptor id.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> 2. Serializability of functions: How do we ensure serializability
> >> of
> >>>>>>>>> functions for catalog persistence? In the Scala/Java API, we
> would
> >>>> like
> >>>>>>>>> to register classes instead of instances soon. This is the only
> way
> >>>> to
> >>>>>>>>> store a function properly in a catalog or we need some
> >>>>>>>>> serialization/deserialization logic in the function interfaces to
> >>>>>>>>> convert an instance to string properties.
> >>>>>>>>>
> >>>>>>>>>> The Python function will be serialized with CloudPickle anyway
> in
> >>>> the
> >>>>>>>> Python API as we need to transfer it to the Python worker which
> can
> >>>> then
> >>>>>>>> deserialize it for execution. The serialized Python function can
> be
> >>>>>>> stored
> >>>>>>>> into catalog.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> 3. TableEnvironment: What is the signature of
> >>>> `register_function(self,
> >>>>>>>>> name, function)`? Does it accept both a class and function? Like
> >>>> `class
> >>>>>>>>> Sum` and `def split()`? Could you add some examples for
> registering
> >>>> both
> >>>>>>>>> kinds of functions?
> >>>>>>>>>
> >>>>>>>>>> It has been already supported which you mentioned. You can find
> an
> >>>>>>>> example in the POC code:
> >>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://github.com/dianfu/flink/commit/93f41ba173482226af7513fdec5acba72b274489#diff-34f619b31a7e38604e22a42a441fbe2fR26
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> 4. FunctionDefinition: Function definition is not a user-defined
> >>>>>>>>> function definition. It is the highest interface for both
> >>>> user-defined
> >>>>>>>>> and built-in functions. I'm not sure if getLanguage() should be
> >> part
> >>>> of
> >>>>>>>>> this interface or one-level down which would be
> >>>> `UserDefinedFunction`.
> >>>>>>>>> Built-in functions will never be implemented in a different
> >>>> language. In
> >>>>>>>>> any case, I would vote for removing the UNKNOWN language, because
> >> it
> >>>>>>>>> does not solve anything. Why should a user declare a function
> that
> >>>> the
> >>>>>>>>> runtime can not handle? I also find the term `JAVA` confusing for
> >>>> Scala
> >>>>>>>>> users. How about `FunctionLanguage.JVM` instead?
> >>>>>>>>>
> >>>>>>>>>> Actually we may have built-in Python functions in the future.
> >>>> Regarding
> >>>>>>>> to the following expression: py_udf1(a, b) + py_udf2(c), if there
> is
> >>>>>>>> built-in Python
> >>>>>>>> funciton for '+' operator, then we don't need to mix using Java
> and
> >>>>>>> Python
> >>>>>>>> UDFs. In this way, we can improve the execution performance.
> >>>>>>>> Regarding to removing FunctionLanguage.UNKNOWN and renaming
> >>>>>>>> FunctionLanguage.Java to FunctionLanguage.JVM, it makes more sense
> >> to
> >>>> me.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> 5. Function characteristics: In the current design, function
> >> classes
> >>>> do
> >>>>>>>>> not extend from any upper class. How can users declare
> >>>> characteristics
> >>>>>>>>> that are present in `FunctionDefinition` like determinism,
> >>>> requirements,
> >>>>>>>>> or soon also monotonism.
> >>>>>>>>>
> >>>>>>>>>> Actually we have defined 'UserDefinedFunction' which is the base
> >>>> class
> >>>>>>>> for all user-defined functions.
> >>>>>>>> We can define the deterministic, requirements, etc in this class.
> >>>>>>>> Currently, we have already supported to define the deterministic.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Timo
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 02.09.19 03:38, Shaoxuan Wang wrote:
> >>>>>>>>>> Hi Jincheng, Fudian, and Aljoscha,
> >>>>>>>>>> I am assuming the proposed python UDX can also be applied to
> Flink
> >>>> SQL.
> >>>>>>>>>> Is this correct? If yes, I would suggest to title the FLIP as
> >> "Flink
> >>>>>>>>> Python
> >>>>>>>>>> User-Defined Function" or "Flink Python User-Defined Function
> for
> >>>>>>> Table".
> >>>>>>>>>> Regards,
> >>>>>>>>>> Shaoxuan
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Aug 28, 2019 at 12:22 PM jincheng sun <
> >>>>>>> sunjincheng...@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Thanks for the feedback Bowen!
> >>>>>>>>>>>
> >>>>>>>>>>> Great thanks for create the FLIP and bring up the VOTE Dian!
> >>>>>>>>>>>
> >>>>>>>>>>> Best, Jincheng
> >>>>>>>>>>>
> >>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月28日周三 上午11:32写道:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I have started a voting thread [1]. Thanks a lot for your help
> >>>> during
> >>>>>>>>>>>> creating the FLIP @Jincheng.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi Bowen,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Very appreciated for your comments. I have replied you in the
> >>>> design
> >>>>>>>>> doc.
> >>>>>>>>>>>> As it seems that the comments doesn't affect the overall
> design,
> >>>> I'll
> >>>>>>>>> not
> >>>>>>>>>>>> cancel the vote for now and we can continue the discussion in
> >> the
> >>>>>>>>> design
> >>>>>>>>>>>> doc.
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1]
> >>>>>>>>>>>>
> >>>>>>>
> >>>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html
> >>>>>>>>>>>> <
> >>>>>>>>>>>>
> >>>>>>>
> >>>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Dian
> >>>>>>>>>>>>
> >>>>>>>>>>>>> 在 2019年8月28日,上午11:05,Bowen Li <bowenl...@gmail.com> 写道:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Jincheng and Dian,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sorry for being late to the party. I took a glance at the
> >>>> proposal,
> >>>>>>>>>>> LGTM
> >>>>>>>>>>>> in
> >>>>>>>>>>>>> general, and I left only a couple comments.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Bowen
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Aug 26, 2019 at 8:05 PM Dian Fu <
> dian0511...@gmail.com
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>> Hi Jincheng,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks! It works.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> Dian
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 在 2019年8月27日,上午10:55,jincheng sun <
> sunjincheng...@gmail.com>
> >>>> 写道:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi Dian, can you check if you have edit access? :)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月26日周一 上午10:52写道:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi Jincheng,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Appreciated for the kind tips and offering of help.
> >> Definitely
> >>>>>>> need
> >>>>>>>>>>>> it!
> >>>>>>>>>>>>>>>> Could you grant me write permission for confluence? My Id:
> >>>> Dian
> >>>>>>> Fu
> >>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>> Dian
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 在 2019年8月26日,上午9:53,jincheng sun <
> sunjincheng...@gmail.com
> >>>
> >>>> 写道:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks for your feedback Hequn & Dian.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Dian, I am glad to see that you want help to create the
> >> FLIP!
> >>>>>>>>>>>>>>>>> Everyone will have first time, and I am very willing to
> >> help
> >>>> you
> >>>>>>>>>>>>>> complete
> >>>>>>>>>>>>>>>>> your first FLIP creation. Here some tips:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> - First I'll give your account write permission for
> >>>> confluence.
> >>>>>>>>>>>>>>>>> - Before create the FLIP, please have look at the FLIP
> >>>> Template
> >>>>>>>>>>> [1],
> >>>>>>>>>>>>>>>> (It's
> >>>>>>>>>>>>>>>>> better to know more about FLIP by reading [2])
> >>>>>>>>>>>>>>>>> - Create Flink Python UDFs related JIRAs after completing
> >> the
> >>>>>>> VOTE
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> FLIP.(I think you also can bring up the VOTE thread, if
> you
> >>>>>>> want!
> >>>>>>>>> )
> >>>>>>>>>>>>>>>>> Any problems you encounter during this period,feel free
> to
> >>>> tell
> >>>>>>> me
> >>>>>>>>>>>> that
> >>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> can solve them together. :)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>> Jincheng
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template
> >>>>>>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> >>>>>>>>>>>>>>>>> Hequn Cheng <chenghe...@gmail.com> 于2019年8月23日周五
> >> 上午11:54写道:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> +1 for starting the vote.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks Jincheng a lot for the discussion.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Best, Hequn
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Fri, Aug 23, 2019 at 10:06 AM Dian Fu <
> >>>>>>> dian0511...@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>> Hi Jincheng,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> +1 to start the FLIP create and VOTE on this feature.
> I'm
> >>>>>>>>> willing
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> help
> >>>>>>>>>>>>>>>>>>> on the FLIP create if you don't mind. As I haven't
> >> created
> >>>> a
> >>>>>>>>> FLIP
> >>>>>>>>>>>>>>>> before,
> >>>>>>>>>>>>>>>>>>> it will be great if you could help on this. :)
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>> Dian
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 在 2019年8月22日,下午11:41,jincheng sun <
> >>>> sunjincheng...@gmail.com>
> >>>>>>>>>>> 写道:
> >>>>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks a lot for your feedback. If there are no more
> >>>>>>>>> suggestions
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> comments, I think it's better to  initiate a vote to
> >>>> create a
> >>>>>>>>>>> FLIP
> >>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>> Apache Flink Python UDFs.
> >>>>>>>>>>>>>>>>>>>> What do you think?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Best, Jincheng
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> jincheng sun <sunjincheng...@gmail.com> 于2019年8月15日周四
> >>>>>>>>>>> 上午12:54写道:
> >>>>>>>>>>>>>>>>>>>>> Hi Thomas,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks for your confirmation and the very important
> >>>> reminder
> >>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>> bundle
> >>>>>>>>>>>>>>>>>>>>> processing.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I have had add the description about how to perform
> >>>> bundle
> >>>>>>>>>>>>>> processing
> >>>>>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>> the perspective of checkpoint and watermark. Feel
> free
> >> to
> >>>>>>>>> leave
> >>>>>>>>>>>>>>>>>>> comments if
> >>>>>>>>>>>>>>>>>>>>> there are anything not describe clearly.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>> Jincheng
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Dian Fu <dian0511...@gmail.com> 于2019年8月14日周三
> >> 上午10:08写道:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hi Thomas,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Thanks a lot the suggestions.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Regarding to bundle processing, there is a section
> >>>>>>>>>>>> "Checkpoint"[1]
> >>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> design doc which talks about how to handle the
> >>>> checkpoint.
> >>>>>>>>>>>>>>>>>>>>>> However, I think you are right that we should talk
> >> more
> >>>>>>> about
> >>>>>>>>>>>> it,
> >>>>>>>>>>>>>>>>>> such
> >>>>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>>>> what's bundle processing, how it affects the
> >> checkpoint
> >>>> and
> >>>>>>>>>>>>>>>>>> watermark,
> >>>>>>>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>>>>>>>>> to handle the checkpoint and watermark, etc.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
> >>>>>>>>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
> >>>>>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>>>>> Dian
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> 在 2019年8月14日,上午1:01,Thomas Weise <t...@apache.org>
> >> 写道:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hi Jincheng,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks for putting this together. The proposal is
> >> very
> >>>>>>>>>>>> detailed,
> >>>>>>>>>>>>>>>>>>>>>> thorough
> >>>>>>>>>>>>>>>>>>>>>>> and for me as a Beam Flink runner contributor easy
> to
> >>>>>>>>>>>> understand
> >>>>>>>>>>>>>> :)
> >>>>>>>>>>>>>>>>>>>>>>> One thing that you should probably detail more is
> the
> >>>>>>> bundle
> >>>>>>>>>>>>>>>>>>>>>> processing. It
> >>>>>>>>>>>>>>>>>>>>>>> is critically important for performance that
> multiple
> >>>>>>>>>>> elements
> >>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>>> processed in a bundle. The default bundle size in
> the
> >>>>>>> Flink
> >>>>>>>>>>>>>> runner
> >>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>> 1s or
> >>>>>>>>>>>>>>>>>>>>>>> 1000 elements, whichever comes first. And for
> >>>> streaming,
> >>>>>>> you
> >>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>> find
> >>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> logic necessary to align the bundle processing with
> >>>>>>>>>>> watermarks
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> checkpointing here:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
> >>>>>>>>>>>>>>>>>>>>>>> Thomas
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Tue, Aug 13, 2019 at 7:05 AM jincheng sun <
> >>>>>>>>>>>>>>>>>>> sunjincheng...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> The Python Table API(without Python UDF support)
> has
> >>>>>>>>> already
> >>>>>>>>>>>>>> been
> >>>>>>>>>>>>>>>>>>>>>> supported
> >>>>>>>>>>>>>>>>>>>>>>>> and will be available in the coming release 1.9.
> >>>>>>>>>>>>>>>>>>>>>>>> As Python UDF is very important for Python users,
> >> we'd
> >>>>>>> like
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> start
> >>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> discussion about the Python UDF support in the
> >> Python
> >>>>>>> Table
> >>>>>>>>>>>> API.
> >>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek, Dian Fu and I have discussed
> >> offline
> >>>>>>> and
> >>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>>>>> drafted a
> >>>>>>>>>>>>>>>>>>>>>>>> design doc[1]. It includes the following items:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> - The user-defined function interfaces.
> >>>>>>>>>>>>>>>>>>>>>>>> - The user-defined function execution
> architecture.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> As mentioned by many guys in the previous
> discussion
> >>>>>>>>>>>> thread[2],
> >>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>>> portability framework was introduced in Apache
> Beam
> >> in
> >>>>>>>>>>> latest
> >>>>>>>>>>>>>>>>>>>>>> releases. It
> >>>>>>>>>>>>>>>>>>>>>>>> provides well-defined, language-neutral data
> >>>> structures
> >>>>>>> and
> >>>>>>>>>>>>>>>>>> protocols
> >>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>> language-neutral user-defined function execution.
> >> This
> >>>>>>>>>>> design
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>> based
> >>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>> Beam's portability framework. We will introduce
> how
> >> to
> >>>>>>> make
> >>>>>>>>>>>> use
> >>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>> Beam's
> >>>>>>>>>>>>>>>>>>>>>>>> portability framework for user-defined function
> >>>>>>> execution:
> >>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>>>>>> transmission, state access, checkpoint, metrics,
> >>>> logging,
> >>>>>>>>>>> etc.
> >>>>>>>>>>>>>>>>>>>>>>>> Considering that the design relies on Beam's
> >>>> portability
> >>>>>>>>>>>>>> framework
> >>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>> Python user-defined function execution and not all
> >> the
> >>>>>>>>>>>>>>>> contributors
> >>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>> Flink community are familiar with Beam's
> portability
> >>>>>>>>>>>> framework,
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>>>>>>> done a prototype[3] for proof of concept and also
> >>>> ease of
> >>>>>>>>>>>>>>>>>>>>>> understanding of
> >>>>>>>>>>>>>>>>>>>>>>>> the design.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Welcome any feedback.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>> Jincheng
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit?usp=sharing
> >>>>>>>>>>>>>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>
> >>>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-38-Support-python-language-in-flink-TableAPI-td28061.html
> >>>>>>>>>>>>>>>>>>>>>>>> [3]
> https://github.com/dianfu/flink/commits/udf_poc
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Reply via email to