[jira] [Commented] (FLINK-14807) Add Table#collect api for fetching data to client

Jiangjie Qin (Jira) Tue, 25 Feb 2020 19:32:23 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-14807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045113#comment-17045113
 ]


Jiangjie Qin commented on FLINK-14807:
--------------------------------------

We should probably first agree on the right solution to solve this problem and 
then think about the path to achieve that. I believe the Operator Coordinator 
would be the right way to go, for the following reasons:
 # The Operator Coordinator was introduced to provided a bi-direction 
communication mechanism between JM and TM, so it is suitable in this case. And 
in the new Sink design, we will have something like {{SinkCoordinator}} and 
{{SinkWriter}}, just like what we have in the Source.
 # Personally I prefer to have the socket server brought up in the Sink 
Coordinator and then broadcast the port info to the TMs. This simplifies the 
architecture and can remove the restriction of the parallelism on the Sink.

While the SinkCoordinator can solve the communication between JM and TM in both 
control plane via operator events and data plane via a socket server in the 
coordinator, the communication problem between the clients and JM still exists, 
i.e. how to send the records back from JM to the client.

It is true that we can introduce a REST api in JM and let the clients fetch 
records through it, but it would worth thinking if we can have a more generic 
solution that may potentially help in other cases as well. I think it might be 
useful to establish a way for the clients to get information from the Operator 
Coordinators. For example, the REST API in the JM can allow the client to query 
the information of a particular coordinator, and the coordinator can return its 
socket server port back to the client. It is a more flexible and more generic 
solution as in the future, people can have other communication with the 
coordinators without changing any code in the JM.

I also have a questions regarding the failover. When the JM fails over, how can 
the clients connect to the JM after the failover?

> Add Table#collect api for fetching data to client
> -------------------------------------------------
>
>                 Key: FLINK-14807
>                 URL: https://issues.apache.org/jira/browse/FLINK-14807
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / API
>    Affects Versions: 1.9.1
>            Reporter: Jeff Zhang
>            Priority: Major
>              Labels: usability
>             Fix For: 1.11.0
>
>         Attachments: table-collect.png
>
>
> Currently, it is very unconvinient for user to fetch data of flink job unless 
> specify sink expclitly and then fetch data from this sink via its api (e.g. 
> write to hdfs sink, then read data from hdfs). However, most of time user 
> just want to get the data and do whatever processing he want. So it is very 
> necessary for flink to provide api Table#collect for this purpose. 
>  
> Other apis such as Table#head, Table#print is also helpful.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-14807) Add Table#collect api for fetching data to client

Reply via email to