Re: [DISCUSS] FLIP-146: Improve new TableSource and TableSink interfaces

Aljoscha Krettek Thu, 24 Sep 2020 07:41:19 -0700

Thanks for the proposal! I think the use cases that we are trying tosolve are indeed valid. However, I think we might have to take a stepback to look at what we're trying to solve and how we can solve it.

The FLIP seems to have two broader topics: 1) add "get parallelism" tosinks/sources 2) let users write DataStream topologies forsinks/sources. I'll treat them separately below.

I think we should not add "get parallelism" to the Table Sink APIbecause I think it's the wrong level of abstraction. The Table APIconnectors are (or should be) more or less thin wrappers around"physical" connectors. By "physical" I mean the underlying (mostlyDataStream API) connectors. For example, with the Kafka Connector theTable API connector just does the configuration parsing and determines agood (de)serialization format and then creates the underlyingFlinkKafkaConsumer/FlinkKafkaProducer.

If we wanted to add a "get parallelism" it would be in those underlyingconnectors but I'm also skeptical about adding such a method therebecause it is a static assignment and would preclude cleveroptimizations about the parallelism of a connector at runtime. But maybethat's thinking too much about future work so I'm open to discussion there.

Regarding the second point of letting Table connector developers useDataStream: I think we should not do it. One of the purposes of FLIP-95[1] was to decouple the Table API from the DataStream API for the basicinterfaces. Coupling the two too closely at that basic level will makeour live harder in the future when we want to evolve those APIs or whenwe want the system to be better at choosing how to execute sources andsinks. An example of this is actually the past of the Table API. BeforeFLIP-95 we had connectors that dealt directly with DataSet andDataStream, meaning that if users wanted their Table Sink to work inboth BATCH and STREAMING mode they had to provide two implementations.The trend is towards unifying the sources/sinks to common interfacesthat can be used for both BATCH and STREAMING execution but, again, Ithink exposing DataStream here would be a step back in the wrong direction.

I think the solution to the existing user requirement of usingDataStream sources and sinks with the Table API should be betterinteroperability between the two APIs, which is being tackled right nowin FLIP-136 [2]. If FLIP-136 is not adequate for the use cases thatwe're trying to solve here, maybe we should think about FLIP-136 some more.


What do you think?

Best,
Aljoscha

[1]https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces[2]https://cwiki.apache.org/confluence/display/FLINK/FLIP-136%3A++Improve+interoperability+between+DataStream+and+Table+API

Re: [DISCUSS] FLIP-146: Improve new TableSource and TableSink interfaces

Reply via email to