Hi Thomas,
Thanks for your quick reply. I will share my thoughts further. :)

1. Without the support of the User-defined function, the Python Table API
can do a lot of things: (Of course, we need to support UDF on the Python
Table API.)

Do you have use cases where the Python table API can be applied without UDF
> support?


Without the support of the User-defined function(UDF), the Python Table API
can do a lot of things, such as ETL, JOIN, Aggregations,
Tumble/Slide/Session Window, etc., because Flink has hundreds of built-in
Scalar functions and Aggregate functions
<https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/expressions/BuiltInFunctionDefinitions.java>
inside(detail can be found here
<https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/expressions/BuiltInFunctionDefinitions.java>),
furthermore,  we can also Add more commonly used built-in
ScalarFunction/TableFunction/AggregateFunction if we need.

Just as we provide Java and Scala language support, we also support the
Table API first and then add support for UDFs.

Just clarify, we will support UDFs on the Python Table API, but the timing
will be in the next phase. In Google doc, the has already been mentioned in
the Future or next step section.

2. Flink itself should have its own Python Table API:

>From my point of view, whether or not Beam supports the Python TableAPI,
Flink itself should have its own Python Table API, such as the interfaces
listed in the design documentation. The Python Table API not only requires
a standard table interface definition but also need defined Flink internal
tableEnvironment, TableConfig, Window, ConnectorDescriptor, TableSource,
TableSink, Catalog and other interface definitions closely related to
Flink. All of these require friendly support on the Python Table API. At
the same time, the Flink Python Table API also has the requirement for the
interactive query. From the perspective of Flink functionality integrity
and user-friendliness, Flink should have its own Python Table API interface
definition, which is at the heart of the FLIP-38 proposal. Of course, I
would very much like to see that Beam has good support for Flink on the
Python Table API. I think they can coexist.

3. Support for Python Table API and support for UDF can be done separately:

Support for the Flink Python TableAPI is a door for the Flink community to
open to Python language users. After opening the Python language, we need
to provide more functional support for the user, including various
operators, such as select/filter/join/window/aggregate, and of course, UDFs
that are added later.

    - Support for the Python Table API(with various operators) What we need
to solve is how to convert Python to Java on the client side, just a very
thin layer of conversion.
    - Support for UDF (scala function/table function/aggregate function)
What we need to solve is how Java communicates with Python at runtime
level, and how Python User-defined Aggregate Function uses Flink state
(State is Flink-specific), As well as the management of the python
environment at runtime, and complex issues such as performance. And I agree
that Flink can work with Beam when discussing UDFs support, and I would
love to see support UDFs in Python Table API can on the top of Beam. :)

So I think that supporting the Python Table API and the problems that
support UDFs need to be solved are very different, we can discuss them
separately.

For a brief summary, I recommend that Flink support the Python Table API
and support for UDFs discuss in the different thread. Let's first discuss
support the Python TableAPI.  What do you think?

Regards,
Jincheng


Thomas Weise <thomas.we...@gmail.com> 于2019年4月5日周五 下午12:11写道:

> Hi Jincheng,
>
> >
> > Yes, we can add use case examples in both google doc and FLIP, I had
> > already add the simple usage in the google doc, here I want to know which
> > kind of examples you want? :)
> >
>
> Do you have use cases where the Python table API can be applied without UDF
> support?
>
> (And where the same could not be accomplished with just SQL.)
>
>
> > The very short answer to UDF support is Yes. As you said, we need UDF
> > support on the Python Table API, including (UDF, UDTF, UDAF). This needs
> to
> > be discussed after basic Python TableAPI supported. Because UDF involves
> > the management of the python environment, Runtime level Java and Runtime
> > communication, and UDAF in Flink also involves the application of State,
> so
> > this is a topic that is worth discussing in depth in a separate thread.
> >
>
> The current proposal for job submission touches something that Beam
> portability already had to solve.
>
> If we think that the Python table API will only be useful with UDF support
> (question above), then it may be better to discuss the first step with the
> final goal in mind. If we find that Beam can be used for the UDF part then
> approach 1 vs. approach 2 in the doc (for the client side language
> boundary) may look different.
>
>
> >
> > I think that no matter how the Flink and Beam work together on the UDF
> > level, it will not affect the current Python API (interface), we can
> first
> > support the Python API in Flink. Then start the UDX (UDF/UDTF/UDAF)
> > support.
> >
> >
> I agree that the client side API should not be affected.
>
>
> > And great thanks for your valuable comments in Google doc! I will
> feedback
> > you in the google doc. :)
> >
> >
> > Regards,
> > Jincheng
> >
> > Thomas Weise <t...@apache.org> 于2019年4月4日周四 上午8:03写道:
> >
> > > Thanks for putting this proposal together.
> > >
> > > It would be nice, if you could share a few use case examples (maybe add
> > > them as section to the FLIP?).
> > >
> > > The reason I ask: The table API is immensely useful, but it isn't clear
> > to
> > > me what value other language bindings provide without UDF support. With
> > > FLIP-38 it will be possible to write a program in Python, but not
> execute
> > > Python functions. Without UDF support, isn't it possible to achieve
> > roughly
> > > the same with plain SQL? In which situation would I use the Python API?
> > >
> > > There was related discussion regarding UDF support in [1]. If the
> > > assumption is that such support will be added later, then I would like
> to
> > > circle back to the question why this cannot be built on top of Beam? It
> > > would be nice to clarify the bigger goal before embarking for the first
> > > milestone.
> > >
> > > I'm going to comment on other things in the doc.
> > >
> > > [1]
> > >
> > >
> >
> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
> > >
> > > Thomas
> > >
> > >
> > > On Wed, Apr 3, 2019 at 12:35 PM Shuyi Chen <suez1...@gmail.com> wrote:
> > >
> > > > Thanks a lot for driving the FLIP, jincheng. The approach looks
> > > > good. Adding multi-lang support sounds a promising direction to
> expand
> > > the
> > > > footprint of Flink. Do we have plan for adding Golang support? As
> many
> > > > backend engineers nowadays are familiar with Go, but probably not
> Java
> > as
> > > > much, adding Golang support would significantly reduce their friction
> > to
> > > > use Flink. Also, do we have a design for multi-lang UDF support, and
> > > what's
> > > > timeline for adding DataStream API support? We would like to help and
> > > > contribute as well as we do have similar need internally at our
> > company.
> > > > Thanks a lot.
> > > >
> > > > Shuyi
> > > >
> > > > On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <
> sunjincheng...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > > As Xianda brought up in the previous email, There are a large
> number
> > of
> > > > > data analysis users who want flink to support Python. At the Flink
> > API
> > > > > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API
> > > will
> > > > > become the first-class citizen. Table API is declarative and can be
> > > > > automatically optimized, which is mentioned in the Flink mid-term
> > > roadmap
> > > > > by Stephan. So we first considering supporting Python at the Table
> > > level
> > > > to
> > > > > cater to the current large number of analytics users. For further
> > > promote
> > > > > Python support in flink table level. Dian, Wei and I discussed
> > offline
> > > a
> > > > > bit and came up with an initial features outline as follows:
> > > > >
> > > > > - Python TableAPI Interface
> > > > >   Introduce a set of Python Table API interfaces, including
> interface
> > > > > definitions such as Table, TableEnvironment, TableConfig, etc.
> > > > >
> > > > > - Implementation Architecture
> > > > >   We will offer two alternative architecture options, one for pure
> > > Python
> > > > > language support and one for extended multi-language design.
> > > > >
> > > > > - Job Submission
> > > > >   Provide a way that can submit(local/remote) Python Table API
> jobs.
> > > > >
> > > > > - Python Shell
> > > > >   Python Shell is to provide an interactive way for users to write
> > and
> > > > > execute flink Python Table API jobs.
> > > > >
> > > > >
> > > > > The design document for FLIP-38 can be found here:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> > > > >
> > > > > I am looking forward to your comments and feedback.
> > > > >
> > > > > Best,
> > > > > Jincheng
> > > > >
> > > >
> > >
> >
>

Reply via email to