I think that something unique along the lines of `REGISTER EXTERNAL DATA
SOURCE` is probably fine, as it doesn't conflict with existing behaviors of
other dialects.
> There is a lot of value in making sure our common operations closely map
to the equivalent common operations in other SQL dialects.
We're trying to make opposite points using the same arguments :) A lot of
popular dialects make difference between CREATE TABLE and CREATE EXTERNAL
TABLE (or similar):
- T-SQL:
create:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql
create external:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
external datasource:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
- PL/SQL:
create:
https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#i1106369
create external:
https://docs.oracle.com/cd/B19306_01/server.102/b14215/et_concepts.htm#i1009127
- postgres:
import foreign schema:
https://www.postgresql.org/docs/9.5/static/sql-importforeignschema.html
create table:
https://www.postgresql.org/docs/9.1/static/sql-createtable.html
- redshift:
create external schema:
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html
create table:
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html
- hive internal and external:
https://www.dezyre.com/hadoop-tutorial/apache-hive-tutorial-tables
My understanding is that the behavior of create table is somewhat similar
in all of the above dialects, from the high-level perspective it usually
creates a persistent table in the current storage context (database).
That's not what Beam SQL's create table does right now, and my opinion is
that it should not be called create table for this reason.
> I think users will be more confused to find that 'CREATE TABLE' doesn't
exist then to learn that it might not always create a table.
I think that having CREATE TABLE do something unexpected or not do
something expected (or do the opposite things depending on the table type
or some flag) is worse than having users look up the correct way of
creating a data source in Beam SQL without expecting something we don't
promise.
> (For example, a user guessing at the syntax of CREATE TABLE would have a
better experience with the error being "field LOCATION not specified"
rather than "operation CREATE TABLE not found".)
They have to look it up anyway (what format is location for a Pubsub topic?
or is it a subscription?), and when doing so I think it would be less
confusing to read that to get data from Pubsub/Kafka/... in Beam SQL you
have to do something like `REGISTER EXTERNAL DATA SOURCE` than `CREATE
TABLE`.
External tables and schemas don't have a standard approach and I don't have
a strong preference between any one from the above.
On Wed, Aug 15, 2018 at 1:08 PM Rui Wang <[email protected]> wrote:
> Adding dev@ back now.
>
> -Rui
>
> On Wed, Aug 15, 2018 at 1:01 PM Andrew Pilloud <[email protected]>
> wrote:
>
>> Did we drop the dev list from this on purpose? (I haven't added it back,
>> but we probably should.)
>>
>> I'm in favor of sticking with the simple 'CREATE TABLE' and 'CREATE
>> SCHEMA' if there is only to be one option. Sticking with those names
>> minimizes both our deviation from other implementations and user surprise.
>> There is a lot of value in making sure our common operations closely map to
>> the equivalent common operations in other SQL dialects. I think users will
>> be more confused to find that 'CREATE TABLE' doesn't exist then to learn
>> that it might not always create a table. This minimizes the overhead of
>> learning our dialect of SQL and maximizes the odds that a user will be able
>> to guess at the syntax of something and have it work. (For example, a user
>> guessing at the syntax of CREATE TABLE would have a better experience with
>> the error being "field LOCATION not specified" rather than "operation
>> CREATE TABLE not found".)
>>
>> If the goal is clarity of the operation, how about 'REGISTER EXTERNAL DATA
>> SOURCE' and 'REGISTER EXTERNAL DATA SOURCE PROVIDER'? Those names remove
>> the ambiguity around the operation creating and the data source being a
>> table.
>>
>> Andrew
>>
>> On Wed, Aug 15, 2018 at 10:54 AM Anton Kedin <[email protected]> wrote:
>>
>>> My preference is to make `EXTERNAL` mandatory and only support `CREATE
>>> EXTERNAL TABLE` for existing semantics. My main reasons are:
>>> - user friendliness, matching expectations, readability. Current
>>> `CREATE TABLE` is basically a `CREATE EXTERNAL TABLE`. It is confusing to
>>> users familiar with SQL who expect that `CREATE TABLE` will actually create
>>> a table;
>>> - forward-compatibility. We could potentially support non-external
>>> `CREATE TABLE` at some point in the future, whatever semantics it might
>>> have. It will be wrong to use the same syntax for external and non-external
>>> CREATEs;
>>>
>>> I agree that typing extra word each time is not ideal, but my opinion is
>>> on the side that readability of code (including SQL) is important (how much
>>> time you spend reading / understanding code vs writing it) and we should
>>> try to improve it if we can. In case of DDL every non-trivial statement
>>> will already have a ton of unavoidable words (field names, types, location,
>>> options) so I would argue that adding extra one word would not noticeably
>>> reduce your happiness of writing it :) But it would improve readability and
>>> reduce ambiguity, which I think is worth it.
>>>
>>> I think that making it optional only introduces more confusion (e.g.
>>> what's the difference between the two DDL statements without reading the
>>> doc?) and would make situation worse.
>>>
>>> Regards,
>>> Anton
>>>
>>>
>>>
>>>
>>> On Wed, Aug 15, 2018 at 10:24 AM Mingmin Xu <[email protected]> wrote:
>>>
>>>> I prefer to `CREATE EXTERNAL TABLE`. My question is, do you plan to
>>>> support both `CREATE TABLE` and `CREATE EXTERNAL TABLE`, by making
>>>> `EXTERNAL` as optional?
>>>>
>>>> On Wed, Aug 15, 2018 at 10:01 AM, Andrew Pilloud <[email protected]>
>>>> wrote:
>>>>
>>>>> I think 'CREATE EXTERNAL TABLE' might make things a bit clearer from a
>>>>> documentation prospective, but I'd be really unhappy if I had to type out
>>>>> 'EXTERNAL' every time. (I have the same concern with 'CREATE EXTERNAL
>>>>> SCHEMA'.)
>>>>>
>>>>> Andrew
>>>>>
>>>>> On Tue, Aug 14, 2018 at 12:38 PM Rui Wang <[email protected]> wrote:
>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> I know you are probably using CREATE TABLE, Can I know your thoughts
>>>>>> on this?
>>>>>>
>>>>>> -Rui
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 14, 2018 at 10:22 AM Rui Wang <[email protected]> wrote:
>>>>>>
>>>>>>> Thanks Mikhail! "Import" is an alternative option. It might be
>>>>>>> better.
>>>>>>>
>>>>>>> "create external" is being widely used by different systems with
>>>>>>> similar meaning so "create" usually is ok to external data sources.
>>>>>>>
>>>>>>> -Rui
>>>>>>>
>>>>>>> On Tue, Aug 14, 2018 at 9:38 AM Mikhail Gryzykhin <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The idea of clarification sounds good to me. I'd appreciate that
>>>>>>>> present, when I was triaging post-commit tests.
>>>>>>>>
>>>>>>>> Do we have any terms that specify connection to external table?
>>>>>>>> "CREATE" word triggers this reaction in my brain that there will be a
>>>>>>>> new
>>>>>>>> table created. Adding "EXTERNAL" would already add distinction, but
>>>>>>>> adding
>>>>>>>> something more explicit for the task might be even better.
>>>>>>>>
>>>>>>>> --Mikhail
>>>>>>>>
>>>>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Strictly speaking, they are not necessarily tables either. We
>>>>>>>>> could also introduce something like CREATE EXTERNAL DATA SOURCE (a-la
>>>>>>>>> T-SQL
>>>>>>>>> <https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017>),
>>>>>>>>> if it's somehow advantageous for us to leverage access patterns or
>>>>>>>>> restrict
>>>>>>>>> DML statements.
>>>>>>>>>
>>>>>>>>> I think your idea of CREATE EXTERNAL TABLE is practical :)
>>>>>>>>>
>>>>>>>>> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Community,
>>>>>>>>>>
>>>>>>>>>> BeamSQL allows CREATE TABLE
>>>>>>>>>> <https://beam.apache.org/documentation/dsls/sql/create-table/>
>>>>>>>>>> statements to register virtual tables from external storage systems
>>>>>>>>>> (e.g.
>>>>>>>>>> BigQuery).
>>>>>>>>>>
>>>>>>>>>> BeamSQL is not a storage system, so any table registered by
>>>>>>>>>> "CREATE TABLE" statement is essentially equivalent to be registered
>>>>>>>>>> by
>>>>>>>>>> "CREATE EXTERNAL TABLE", which requires the user to provide a
>>>>>>>>>> LOCATION and
>>>>>>>>>> BeamSQL will register the table outside of current execution
>>>>>>>>>> environment
>>>>>>>>>> based on LOCATION.
>>>>>>>>>>
>>>>>>>>>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in BeamSQL
>>>>>>>>>> to help users understand they are registering tables, and BeamSQL
>>>>>>>>>> does not
>>>>>>>>>> create non existing tables by running CREATE TABLE (at least on some
>>>>>>>>>> storage systems, if not all).
>>>>>>>>>>
>>>>>>>>>> We can make the EXTERNAL keyword either required or optional.
>>>>>>>>>>
>>>>>>>>>> If we make the EXTERNAL keyword required:
>>>>>>>>>>
>>>>>>>>>> Pros:
>>>>>>>>>> a. We can get rid of the registering table semantic on CREATE
>>>>>>>>>> TABLE.
>>>>>>>>>> b, We keep the room that we could add CREATE TABLE back in the
>>>>>>>>>> future if we want CREATE TABLE to create, rather than not only
>>>>>>>>>> register
>>>>>>>>>> tables in BeamSQL.
>>>>>>>>>>
>>>>>>>>>> Cons:
>>>>>>>>>> 1. CREATE TABLE syntax will not be supported so existing BeamSQL
>>>>>>>>>> pipelines which has CREATE TABLE require changes.
>>>>>>>>>> 2. It's required to type tedious EXTERNAL keyword every time,
>>>>>>>>>> especially in SQL Shell.
>>>>>>>>>>
>>>>>>>>>> If we make the EXTERNAL keyword optional, we will have reversed
>>>>>>>>>> pros and cons above.
>>>>>>>>>>
>>>>>>>>>> Any thoughts on adding EXTERNAL keyword, and make it required or
>>>>>>>>>> optional?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Rui
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>>
>>>> --
>>>> ----
>>>> Mingmin
>>>>
>>>