[jira] [Commented] (FLINK-14807) Add Table#collect api for fetching data to client

Aljoscha Krettek (Jira) Thu, 23 Jan 2020 01:38:35 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-14807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021889#comment-17021889
 ]


Aljoscha Krettek commented on FLINK-14807:
------------------------------------------

I think tying the job name to the environment would be problematic. I have some 
thoughts on this but they span multiple things and are a bit more general than 
this issue. I'll post them here nevertheless:

One environment can spawn multiple jobs if you call {{execute()}} multiple 
times. For batch jobs, this is sometimes not a problem. It becomes a problem 
when the component (or user) that runs the Flink program expects there to be 
only one job. For example, if you {{bin/flink run --fromSavepoint}}, which 
{{execute()}} should "pick up" the savepoint. Currently, it will be the first 
execute call that happens, this might or might not work depending on whether 
it's the right savepoint for that one. Subsequent {{execute()}} calls will also 
try and restore from that savepoint which, again, might or might not fail.

Another scenario where this will be problematic is "driver mode", or a mode 
where we run the {{main()}} method on the "JobManager", for example in the 
per-job standalone entrypoint or potential future other modes where the 
{{main()}} method is run in the cluster.

In general, I now think that the "execute-style" of writing jobs does not work 
well for streaming programs and we might have to re-introduce an interface like
{code}
interface FlinkJob {
  Pipeline getPipeline();
}
{code}
for streaming scenarios.

> Add Table#collect api for fetching data to client
> -------------------------------------------------
>
>                 Key: FLINK-14807
>                 URL: https://issues.apache.org/jira/browse/FLINK-14807
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / API
>    Affects Versions: 1.9.1
>            Reporter: Jeff Zhang
>            Priority: Major
>              Labels: usability
>             Fix For: 1.11.0
>
>
> Currently, it is very unconvinient for user to fetch data of flink job unless 
> specify sink expclitly and then fetch data from this sink via its api (e.g. 
> write to hdfs sink, then read data from hdfs). However, most of time user 
> just want to get the data and do whatever processing he want. So it is very 
> necessary for flink to provide api Table#collect for this purpose. 
>  
> Other apis such as Table#head, Table#print is also helpful.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-14807) Add Table#collect api for fetching data to client

Reply via email to