Hi everyone, As we all know, Flink provides three layered APIs: the ProcessFunctions, the DataStream API and the SQL & Table API. Each API offers a different trade-off between conciseness and expressiveness and targets different use cases[1].
Currently, the SQL & Table API has already been supported in PyFlink. The API provides relational operations as well as user-defined functions to provide convenience for users who are familiar with python and relational programming. Meanwhile, the DataStream API and ProcessFunctions provide more generic APIs to implement stream processing applications. The ProcessFunctions expose time and state which are the fundamental building blocks for any kind of streaming application. To cover more use cases, we are planning to cover all these APIs in PyFlink. In this discussion(FLIP-130), we propose to support the Python DataStream API for the stateless part. For more detail, please refer to the FLIP wiki page here[2]. If interested in the stateful part, you can also take a look the design doc here[3] for which we are going to discuss in a separate FLIP. Any comments will be highly appreciated! [1] https://flink.apache.org/flink-applications.html#layered-apis [2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298 [3] https://docs.google.com/document/d/1H3hz8wuk228cDBhQmQKNw3m1q5gDAMkwTDEwnj3FBI/edit?usp=sharing Best, Shuiqiang