Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

Timo Walther Thu, 02 Apr 2020 05:17:29 -0700

Hi Jark,

if catalogs use `CatalogTableSchema` in the future. The source wouldinternally also work with `CatalogTableSchema`. I'm fine with cleaningup the `TableSchema` class but should a source deal with two differentschema classes then?

Another problem that I see is that connectors usually need to performsome index arithmetics. Dealing with TableSchema and additionally withina field with DataType might be a bit inconvenient. A dedicated classwith utilities might be helpful such that not every source needs toimplement the same intersection/diff logic again.


Regards,
Timo


On 02.04.20 14:06, Jark Wu wrote:

Hi Dawid,

How to express projections with TableSchema?

The TableSource holds the original TableSchema (i.e. from DDL) and the
pushed TableSchema represents the schema after projection.
Thus the table source can compare them to figure out changed field orders
or not matched types.
For most sources who maps physical storage by field names (e.g. jdbc,
hbase, json) they can just simply apply the pushed TableSchema.
But sources who maps by field indexes (e.g. csv), they need to figure out
the projected indexes by comparing the original and projected schema.
For example, the original schema is [a: String, b: Int, c: Timestamp], and
b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the
source can figure out index=1 is pruned.

How do we express projection of a nested field with TableSchema?

This is the same to the above one. For example, the original schema is [rk:
String, f1 Row<q1 Int, q2 Double>].
If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2
Double>].

TableSchema might be used at too many different places for different

responsibilities.
Agree. We have recognized that a structure and builder for pure table
schema is required in many places. But we mixed many concepts of catalog
table schema in TableSchema.
IIRC, in an offline discussion of FLIP-84, we want to introduce a new
`CatalogTableSchema` to represent the schema part of a DDL,
and remove all the watermark, computed column information from TableSchema?
Then `TableSchema` can continue to serve as a pure table schema and it
stays in a good package.

Best,
Jark




On Thu, 2 Apr 2020 at 19:39, Timo Walther <[email protected]> wrote:

Hi Dawid,

thanks for your feedback. I agree with your concerns. I also observed
that TableSchema might be used at too many different places for
different responsibilities.

How about we introduce a helper class for `SupportsProjectionPushDown`
and also `LookupTableSource#Context#getKeys()` to represent nested
structure of names. Data types, constraints, or computed columns are not
necessary at those locations.

We can also add utility methods for connectors to this helper class
there to quickly figuring out differences between the original table
schema and the new one.

SelectedFields {

         private LogicalType orignalRowType; // set by the planner

         private int[][] indices;

         getNames(int... at): String[]

         getNames(String... at): String[]

         getIndices(int... at): int[]

         getNames(String... at): String[]

         toTableSchema(): TableSchema
}

What do others think?

Thanks,
Timo



On 02.04.20 12:28, Dawid Wysakowicz wrote:

Generally +1

One slight concern I have is about the |SupportsProjectionPushDown.|I
don't necessarily understand how can we express projections with
TableSchema. It's unclear for me what happens when a type of a field
changes, fields are in a different order, when types do not match. How
do we express projection of a nested field with TableSchema?

I don't think this changes the core design presented in the FLIP,
therefore I'm fine with accepting the FLIP. I wanted to mention my
concerns, so that maybe we can adjust the passed around structures

slightly.


Best,

Dawid
||

On 30/03/2020 14:42, Leonard Xu wrote:

+1(non-binding)

Best,
Leonard Xu

在 2020年3月30日，16:43，Jingsong Li<[email protected]>  写道：

+1

Best,
Jingsong Lee

On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[email protected]>  wrote:

+1

Best,
Kurt


On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[email protected]>

wrote:

+1 (non-binding)

Jark Wu<[email protected]>  于2020年3月30日周一 下午3:57写道：

+1 from my side.

Thanks Timo for driving this.

Best,
Jark

On Mon, 30 Mar 2020 at 15:36, Timo Walther<[email protected]>

wrote:

Hi all,

I would like to start the vote for FLIP-95 [1], which is discussed

and

reached a consensus in the discussion thread [2].

The vote will be open until April 2nd (72h), unless there is an
objection or not enough votes.

Thanks,
Timo

[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces

[2]

https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E

--

Benchao Li
School of Electronics Engineering and Computer Science, Peking

University

Tel:+86-15650713730
Email:[email protected];[email protected]

--
Best, Jingsong Lee

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

Reply via email to