[
https://issues.apache.org/jira/browse/FLINK-14807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045113#comment-17045113
]
Jiangjie Qin commented on FLINK-14807:
--------------------------------------
We should probably first agree on the right solution to solve this problem and
then think about the path to achieve that. I believe the Operator Coordinator
would be the right way to go, for the following reasons:
# The Operator Coordinator was introduced to provided a bi-direction
communication mechanism between JM and TM, so it is suitable in this case. And
in the new Sink design, we will have something like {{SinkCoordinator}} and
{{SinkWriter}}, just like what we have in the Source.
# Personally I prefer to have the socket server brought up in the Sink
Coordinator and then broadcast the port info to the TMs. This simplifies the
architecture and can remove the restriction of the parallelism on the Sink.
While the SinkCoordinator can solve the communication between JM and TM in both
control plane via operator events and data plane via a socket server in the
coordinator, the communication problem between the clients and JM still exists,
i.e. how to send the records back from JM to the client.
It is true that we can introduce a REST api in JM and let the clients fetch
records through it, but it would worth thinking if we can have a more generic
solution that may potentially help in other cases as well. I think it might be
useful to establish a way for the clients to get information from the Operator
Coordinators. For example, the REST API in the JM can allow the client to query
the information of a particular coordinator, and the coordinator can return its
socket server port back to the client. It is a more flexible and more generic
solution as in the future, people can have other communication with the
coordinators without changing any code in the JM.
I also have a questions regarding the failover. When the JM fails over, how can
the clients connect to the JM after the failover?
> Add Table#collect api for fetching data to client
> -------------------------------------------------
>
> Key: FLINK-14807
> URL: https://issues.apache.org/jira/browse/FLINK-14807
> Project: Flink
> Issue Type: New Feature
> Components: Table SQL / API
> Affects Versions: 1.9.1
> Reporter: Jeff Zhang
> Priority: Major
> Labels: usability
> Fix For: 1.11.0
>
> Attachments: table-collect.png
>
>
> Currently, it is very unconvinient for user to fetch data of flink job unless
> specify sink expclitly and then fetch data from this sink via its api (e.g.
> write to hdfs sink, then read data from hdfs). However, most of time user
> just want to get the data and do whatever processing he want. So it is very
> necessary for flink to provide api Table#collect for this purpose.
>
> Other apis such as Table#head, Table#print is also helpful.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)