[
https://issues.apache.org/jira/browse/FLINK-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880196#comment-15880196
]
Fabian Hueske commented on FLINK-5568:
--------------------------------------
Hi [~jinyu.zj] and [~ykt836], thanks for this proposal.
Connecting to external catalogs would be a great feature!
I like the idea of generating the {{TableSource}} using a converter. I also
think that generating {{TableSources}} which are independent of the catalog is
a good decision (it also makes implementing connectors to different catalogs
easier which do not offer direct access to the data). This decision does not
mean that we could not have an {{HCatalogTableSource}} (might be a good idea
for formats that we do not natively support yet).
I have a few questions on details of the approach:
- I did not completely understand the role of {{ExternalCatalogTable}}. I
assume it would be generated by the catalog interface. Does it directly extend
{{FlinkTable}} or would it be converted into a {{TableSourceTable}} which holds
the {{TableSource}} that we generated by the converter?
- Does {{partitionColumnNames}} need to be a top level member of the
{{ExternalCatalogTable}} or does it make sense to move it into the
{{properties}}? Not all TableSource support partitions and those who do, could
get the info from the {{properties}} just like all source specific properties.
- Do you want to implement a new schema class? We already have to deal with
Flink's (fieldnames + fieldtypes as TypeInfo) and Calcite's representation and
have lots of tooling for conversion between both available. I think it would be
good to choose either of both and not implement a new one.
- how would you identify a {{TableSource}} that is annotated with
{{@ExternalCatalogCompatible}}. I assume we would need to scan the classpath
for that. Any plans how to do that?
Thanks, Fabian
> Introduce interface for catalog, and provide an in-memory implementation, and
> integrate with calcite schema
> -----------------------------------------------------------------------------------------------------------
>
> Key: FLINK-5568
> URL: https://issues.apache.org/jira/browse/FLINK-5568
> Project: Flink
> Issue Type: Sub-task
> Components: Table API & SQL
> Reporter: Kurt Young
> Assignee: jingzhang
>
> The {{TableEnvironment}} now provides a mechanism to register temporary
> table. It registers the temp table to calcite catalog, so SQL and TableAPI
> queries can access to those temp tables. Now DatasetTable, DataStreamTable
> and TableSourceTable can be registered to {{TableEnvironment}} as temporary
> tables.
> This issue wants to provides a mechanism to connect external catalogs such as
> HCatalog to the {{TableEnvironment}}, so SQL and TableAPI queries could
> access to tables in the external catalogs without register those tables to
> {{TableEnvironment}} beforehand.
> First, we should point out that there are two kinds of catalog in Flink
> actually.
> The first one is external catalog as we mentioned before, it provides CRUD
> operations to databases/tables.
> The second one is calcite catalog, it defines namespace that can be accessed
> in Calcite queries. It depends on Calcite Schema/Table abstraction.
> SqlValidator and SqlConverter depends on the calcite catalog to fetch the
> tables in SQL or TableAPI.
> So we need to do the following things:
> 1. introduce interface for external catalog, maybe provide an in-memory
> implementation first for test and develop environment.
> 2. introduce a mechanism to connect external catalog with Calcite catalog so
> the tables/databases in external catalog can be accessed in Calcite catalog.
> Including convert databases of externalCatalog to Calcite sub-schemas,
> convert tables in a database of externalCatalog to Calcite tables (only
> support {{TableSourceTable}}).
> 3. register external catalog to {{TableEnvironment}}.
> Here is the design mode of ExternalCatalogTable.
> | identifier | TableIdentifier | dbName and tableName
> of table |
> | tableType | String | type of external catalog table,
> e.g csv, hbase, kafka
> | schema | DataSchema| schema of table data,
> including column names and column types
> | partitionColumnNames | List<String> | names of partition column
> | properties | Map<String, String> |properties of
> external catalog table
> | stats | TableStats | statistics of external
> catalog table
> | comment | String |
> | create time | long
> There is still a detail problem need to be take into consideration, that is ,
> how to convert {{ExternalCatalogTable}} to {{TableSourceTable}}. The
> question is equals to convert {{ExternalCatalogTable}} to {{TableSource}}
> because we could easily get {{TableSourceTable}} from {{TableSource}}.
> Because different {{TableSource}} often contains different fields to initiate
> an instance. E.g. {{CsvTableSource}} needs path, fieldName, fieldTypes,
> fieldDelim, rowDelim and so on to create a new instance ,
> {{KafkaTableSource}} needs configuration and tableName to create a new
> instance. So it's not a good idea to let Flink framework be responsible for
> translate {{ExternalCatalogTable}} to different kind of
> {{TableSourceTable}}.
> Here is one solution. Let {{TableSource}} specify a converter.
> 1. provide an Annatition named {{ExternalCatalogCompatible}}. The
> {{TableSource}} with the annotation means it is compatible with external
> catalog, that is, it could be converted to or from ExternalCatalogTable. This
> annotation specifies the tabletype and converter of the tableSource. For
> example, for {{CsvTableSource}}, it specifies the tableType is csv and
> converter class is CsvTableSourceConverter.
> {code}
> @ExternalCatalogCompatible(tableType = "csv", converter =
> classOf[CsvTableSourceConverter])
> class CsvTableSource(...) {
> ...}
> {code}
> 2. Scan all TableSources with the ExternalCatalogCompatible annotation, save
> the tableType and converter in a Map
> 3. When need to convert {{ExternalCatalogTable}} to {{TableSource}} , get the
> converter based on tableType. and let converter do convert
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)