Hi everyone,

thanks for the great discussion with a good outcome.

I have one last comment about:

void createTemporaryFunction(String path, UserDefinedFunction function);

We should encourage users to register a class instead of an instance. We should enforce a default constructor in the future. If we need metadata or type inference, the planner can simply instantiate the class and call the corresponding getters. If people would like to parameterize a function, they can use global job parameters instead - which are available in the open() method.

I suggest:

void createTemporaryFunction(String path, Class<? extends UserDefinedFunction> functionClass);

This is also closer to the SQL DDL that is about to be implemented in:
https://issues.apache.org/jira/browse/FLINK-7151

Thanks,
Timo


On 09.10.19 17:07, Bowen Li wrote:
Hi Dawid,

+1 for proposed changes

On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <dwysakow...@apache.org>
wrote:

Sorry for a very delayed response.

@Kurt Yes, this is the goal to have a function created like new
Function(...) also be wrapped into CatalogFunction. This would have to
be though a temporary function as we cannot represent that as a set of
properties. Similar to the createTemporaryView(DataStream stream).

As for the ConnectTableDescriptor I agree this is very similar to
CatalogTable. I am not sure though if we should get rid of it. In the
end I see it as a builder for a CatalogTable, which is a slightly more
internal API, but we might revisit that some time in the future if we
find that it makes more sense.

@All I updated the FLIP page with some more details from the outcome of
the discussions around FLIP-57. Please take a look. I would like to
start a vote on this FLIP as soon as the vote on FLIP-57 goes through.

Best,

Dawid


On 19/09/2019 09:24, Kurt Young wrote:
IIUC it's good to see that both serializable (tables description from
DDL)
and unserializable (tables with DataStream underneath) tables are treated
unify with CatalogTable.

Can I also assume functions that either come from a function class (from
DDL)
or function objects (newed by user) will also treated unify with
CatalogFunction?

This will greatly simplify and unify current API level concepts and
design.
And it seems only one thing left, how do we deal with
ConnectTableDescriptor?
It's actually very similar with serializable CatalogTable, both carry
some
text
properties which even are the same. Is there any chance we can further
unify
this to CatalogTable?

object
Best,
Kurt


On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <imj...@gmail.com> wrote:

Thanks Dawid for the design doc.

In general, I’m +1 to the FLIP.


+1 to the single-string and parse way to express object path.

+1 to deprecate registerTableSink & registerTableSource.
But I would suggest to provide an easy way to register a custom
source/sink before we drop them (this is another story).
Currently, it’s not easy to implement a custom connector descriptor.

Best,
Jark


在 2019年9月19日,11:37,Dawid Wysakowicz <wysakowicz.da...@gmail.com> 写道:

Hi JingsongLee,
 From my understanding they can. Underneath they will be CatalogTables.
The
difference is the lifetime of the tables. Plus some of the user facing
interfaces cannot be persisted e.g. datastream. Therefore we must have
a
separate methods for that. In the end the temporary tables are held in
memory as CatalogTables.
Best,
Dawid

On Thu, 19 Sep 2019, 10:08 JingsongLee, <lzljs3620...@aliyun.com
.invalid>
wrote:

Hi dawid:
Can temporary tables achieve the same capabilities as catalog table?
like statistics: CatalogTableStatistics, CatalogColumnStatistics,
PartitionStatistics
like partition support: we have added some catalog equivalent
interfaces
on TableSource/TableSink: getPartitions, getPartitionFieldNames
Maybe it's not a good idea to add these interfaces to
TableSource/TableSink. What do you think?

Best,
Jingsong Lee


------------------------------------------------------------------
From:Kurt Young <ykt...@gmail.com>
Send Time:2019年9月18日(星期三) 17:54
To:dev <dev@flink.apache.org>
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
module

Hi all,

Sorry to join this party late. Big +1 to this flip, especially for the
dropping
"registerTableSink & registerTableSource" part. These are indeed
legacy
and we should try to unify them through CatalogTable after we
introduce
the concept of Catalog.

 From my understanding, what we can registered should all be metadata,
TableSource/TableSink should only be the one who is responsible to do
the real work, i.e. reading and writing data according to the schema
and
other information like computed column, partition, .e.g.

Best,
Kurt


On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <lzljs3620...@aliyun.com
.invalid>
wrote:

After some development and thinking, I have a general understanding.
+1 to registering a source/sink does not fit into the SQL world.
I am OK to have a deprecated registerTemporarySource/Sink to
compatible
with old ways.

Best,
Jingsong Lee


------------------------------------------------------------------
From:Timo Walther <twal...@apache.org>
Send Time:2019年9月17日(星期二) 08:00
To:dev <dev@flink.apache.org>
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
module

Hi Dawid,

thanks for the design document. It fixes big concept gaps due to
historical reasons with proper support for serializability and
catalog
support in mind.

I would not mind a registerTemporarySource/Sink, but the problem
that I
see is that many people think that this is the recommended way of
registering a table source/sink which is not true. We should guide
users
to either use connect() or DDL API which can be validated and stored
in
catalog.

Also from a concept perspective, registering a source/sink does not
fit
into the SQL world. SQL does not know about source/sinks but only
about
tables. If the responsibility of a TableSource/TableSink is just a
pure
physical data consumer/producer that is not connected to the actual
logical table schema, we would need a possibility of defining time
attributes and interpreting/converting a changelog. This should be
done
by the framework with information from the DDL/connect() and not be
defined in every table source.

Regards,
Timo


On 09.09.19 14:16, JingsongLee wrote:
Hi dawid:

It is difficult to describe specific examples.
Sometimes users will generate some java converters through some
  Java code, or generate some Java classes through third-party
  libraries. Of course, these can be best done through properties.
But this requires additional work from users.My suggestion is to
  keep this Java instance class way that is user-friendly.

Best,
Jingsong Lee


------------------------------------------------------------------
From:Dawid Wysakowicz <dwysakow...@apache.org>
Send Time:2019年9月6日(星期五) 16:21
To:dev <dev@flink.apache.org>
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
Table
module
Hi all,
@Jingsong Could you elaborate a bit more what do you mean by
"some Connectors are difficult to convert all states to properties"
All the Flink provided connectors will definitely be expressible
with
properties (In the end you should be able to use them from DDL). I
think
if
a TableSource is complex enough that it handles filter push down,
partition
support etc. should rather be made available both from DDL &
java/scala
code. I'm happy to reconsider adding registerTemporaryTable(String
path,
TableSource source) if you have some concrete examples in mind.
@Xuefu: We also considered the ObjectIdentifier (or actually
introducing
a new identifier representation to differentiate between resolved and
unresolved identifiers) with the same concerns. We decided to suggest
the
string & parsing logic because of usability.
     tEnv.from("cat.db.table")
is shorter and easier to write than
     tEnv.from(Identifier.for("cat", "db", "name")
And also implicitly solves the problem what happens if a user (e.g.
used
to other systems) uses that API in a following manner:
     tEnv.from(Identifier.for("db.name")
I'm happy to revisit it if the general consensus is that it's better
to
use the OO aproach.
Best,
Dawid

On 06/09/2019 10:00, Xuefu Z wrote:

Thanks to Dawid for starting the discussion and writeup. It looks
pretty
good to me except that I'm a little concerned about the object
reference
and string parsing in the code, which seems to an anti-pattern to
OOP.
Have
we considered using ObjectIdenitifier with optional catalog and db
parts,
esp. if we are worried about arguments of variable length or method
overloading? It's quite likely that the result of string parsing is
an
ObjectIdentifier instance any way.

Having string parsing logic in the code is a little dangerous as it
duplicates part of the DDL/DML parsing, and they can easily get out
of
sync.
Thanks,
Xuefu

On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <lzljs3620...@aliyun.com
.invalid>
wrote:


Thanks dawid, +1 for this approach.

One concern is the removal of registerTableSink &
registerTableSource
  in TableEnvironment. It has two alternatives:
1.the properties approach (DDL, descriptor).
2.from/toDataStream.

#1 can only be properties, not java states, and some Connectors
  are difficult to convert all states to properties.
#2 can contain java state. But can't use TableSource-related
features,
like project & filter push down, partition support, etc..

Any idea about this?

Best,
Jingsong Lee


------------------------------------------------------------------
From:Dawid Wysakowicz <dwysakow...@apache.org>
Send Time:2019年9月4日(星期三) 22:20
To:dev <dev@flink.apache.org>
Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table
module
Hi all,
As part of FLIP-30 a Catalog API was introduced that enables storing
table
meta objects permanently. At the same time the majority of current
APIs
create temporary objects that cannot be serialized. We should
clarify
the
creation of meta objects (tables, views, functions) in a unified
way.
Another current problem in the API is that all the temporary objects
are
stored in a special built-in catalog, which is not very intuitive
for
many
users, as they must be aware of that catalog to reference temporary
objects.
Lastly, different APIs have different ways of providing object
paths:
String path…,
String path, String pathContinued…
String name
We should choose one approach and unify it across all APIs.
I suggest a FLIP to address the above issues.
Looking forward to your opinions.
FLIP link:

https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module


Reply via email to