[GitHub] [flink] bowenli86 commented on issue #8007: [FLINK-11474][table] Add ReadableCatalog, ReadableWritableCatalog, and other …

GitBox Thu, 11 Apr 2019 00:46:27 -0700

bowenli86 commented on issue #8007: [FLINK-11474][table] Add ReadableCatalog, 
ReadableWritableCatalog, and other …
URL: https://github.com/apache/flink/pull/8007#issuecomment-482005765
 
 
   Hi all,
   
   I believe @sunjincheng121 and @hequn8128 brought up valuable suggestions to 
avoid API name confusions. Xuefu also made very good points in consideration of 
API design and impl, and that javadoc should be the true for understanding APIs.
   
   Previously I may be more affected by Hive's design given I've been working 
heavily on integrating Flink-Hive. @sunjincheng121 's concerns, if I understand 
correctly, may come from that these APIs will be used by not only SQL users but 
also Table API users, who may not have Hive backgrounds and thus easier to get 
confused. Thus I tried to step out of Hive context, and inspect these APIs from 
the perspective of their usage, as well as referencing MySQL, Postgres, Oracle, 
SQL Server, and Hive. Here are my thoughts:
   
   On the reading side, view is always treated as a logical table. In queries 
(SELECT in standard SQL DML), view is table - 'FROM' clause is always "FROM x" 
rather than "FROM `TABLE/VIEW` x". It's planner's responsibility to process 
views specially. Meta commands as well, if with no extra params - "DESCRIBE" 
doesn't distinguish them; Listing tables usually goes in two syntax, "SHOW 
TABLES" and "SELECT * FROM meta", they return both tables and views, listing 
only views would be different commands or with extra params like "SHOW VIEWS" 
and "SELECT * FROM meta WHERE type='view'"
   
   On the writing side, view is treated differently from table, given 
representations of view and table are a bit different (though they share some 
common fields). DDL, especially CREATE and ALTER,  are always requires 
specifying either `TABLE` or `VIEW` as "CREATE/ALTER `TABLE/VIEW` x". 
"DROP/RENAME" don't touch fields inside table and view, thus their impl behind 
the scene are usually the same, and therefore some databases choose to not 
require the `TABLE/VIEW` keyword, but I think it really depends on the 
developers. Since our devs feel strongly that it causes confusions, we can 
requires the keywords in our APIs and Flink SQL.
   
   I think we should avoid design in which a SQL statement is translated into 
multiple catalog API calls or requires unnecessary extra processing. With that 
in mind, and also given the above conclusions (please correct me if there's 
anything above is wrong), I propose the following solution:`ReadableCatalog` 
APIs should treat views as tables by default if no extra params specified, thus 
`getTable()` and `listTables()` operate on both table and view, and we will 
have individual APIs as `listPhysicalTables()`, `listViews()`, and potentially 
`listMaterializedViews()` in the future. `ReadableWritableCatalog` APIs should 
treat views and tables differently, thus have create/alter/drop/rename APIs 
separately for view and table. E.g. dropTable() and dropView(), even though the 
two will very likely share the same code. We will also add clear javadoc and 
Flink documentations for all catalog APIs in a separate PR. This way, we can 
eliminate confusions and still maintain a 1 on 1 mapping between SQL statements 
and catalog APIs.
   
   What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on issue #8007: [FLINK-11474][table] Add ReadableCatalog, ReadableWritableCatalog, and other …

Reply via email to