+1 for CREATE EXTERNAL TABLE.  It is a good balance between the general SQL
expectation of having tables as an abstraction and reinforcing that Beam
does not store your data.

On Wed, Aug 15, 2018 at 1:58 PM Rui Wang <[email protected]> wrote:

> >  I think users will be more confused to find that 'CREATE TABLE' doesn't
> exist then to learn that it might not always create a table.
>
> >> I think that having CREATE TABLE do something unexpected or not do
> something expected (or do the opposite things depending on the table type
> or some flag) is worse than having users look up the correct way of
> creating a data source in Beam SQL without expecting something we don't
> promise.
>
> I agree on this. Enforcing users to look up documentation for the correct
> way is better than letting them use an ambiguous way that could fail their
> expectation.
>
>
> -Rui
>
> On Wed, Aug 15, 2018 at 1:46 PM Anton Kedin <[email protected]> wrote:
>
>> I think that something unique along the lines of `REGISTER EXTERNAL DATA
>> SOURCE` is probably fine, as it doesn't conflict with existing behaviors of
>> other dialects.
>>
>> > There is a lot of value in making sure our common operations closely
>> map to the equivalent common operations in other SQL dialects.
>>
>> We're trying to make opposite points using the same arguments :) A lot of
>> popular dialects make difference between CREATE TABLE and CREATE EXTERNAL
>> TABLE (or similar):
>>  - T-SQL:
>>       create:
>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql
>>       create external:
>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
>>       external datasource:
>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
>>  - PL/SQL:
>>       create:
>> https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#i1106369
>>       create external:
>> https://docs.oracle.com/cd/B19306_01/server.102/b14215/et_concepts.htm#i1009127
>>  - postgres:
>>       import foreign schema:
>> https://www.postgresql.org/docs/9.5/static/sql-importforeignschema.html
>>       create table:
>> https://www.postgresql.org/docs/9.1/static/sql-createtable.html
>>  - redshift:
>>       create external schema:
>> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html
>>       create table:
>> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html
>>  - hive internal and external:
>> https://www.dezyre.com/hadoop-tutorial/apache-hive-tutorial-tables
>>
>> My understanding is that the behavior of create table is somewhat similar
>> in all of the above dialects, from the high-level perspective it usually
>> creates a persistent table in the current storage context (database).
>> That's not what Beam SQL's create table does right now, and my opinion is
>> that it should not be called create table for this reason.
>>
>> >  I think users will be more confused to find that 'CREATE TABLE'
>> doesn't exist then to learn that it might not always create a table.
>>
>> I think that having CREATE TABLE do something unexpected or not do
>> something expected (or do the opposite things depending on the table type
>> or some flag) is worse than having users look up the correct way of
>> creating a data source in Beam SQL without expecting something we don't
>> promise.
>>
>> >  (For example, a user guessing at the syntax of CREATE TABLE would have
>> a better experience with the error being "field LOCATION not specified"
>> rather than "operation CREATE TABLE not found".)
>>
>> They have to look it up anyway (what format is location for a Pubsub
>> topic? or is it a subscription?), and when doing so I think it would be
>> less confusing to read that to get data from Pubsub/Kafka/... in Beam SQL
>> you have to do something like `REGISTER EXTERNAL DATA SOURCE` than `CREATE
>> TABLE`.
>>
>> External tables and schemas don't have a standard approach and I don't
>> have a strong preference between any one from the above.
>>
>> On Wed, Aug 15, 2018 at 1:08 PM Rui Wang <[email protected]> wrote:
>>
>>> Adding dev@ back now.
>>>
>>> -Rui
>>>
>>> On Wed, Aug 15, 2018 at 1:01 PM Andrew Pilloud <[email protected]>
>>> wrote:
>>>
>>>> Did we drop the dev list from this on purpose? (I haven't added it
>>>> back, but we probably should.)
>>>>
>>>> I'm in favor of sticking with the simple 'CREATE TABLE' and 'CREATE
>>>> SCHEMA' if there is only to be one option. Sticking with those names
>>>> minimizes both our deviation from other implementations and user surprise.
>>>> There is a lot of value in making sure our common operations closely map to
>>>> the equivalent common operations in other SQL dialects. I think users will
>>>> be more confused to find that 'CREATE TABLE' doesn't exist then to learn
>>>> that it might not always create a table. This minimizes the overhead of
>>>> learning our dialect of SQL and maximizes the odds that a user will be able
>>>> to guess at the syntax of something and have it work. (For example, a user
>>>> guessing at the syntax of CREATE TABLE would have a better experience with
>>>> the error being "field LOCATION not specified" rather than "operation
>>>> CREATE TABLE not found".)
>>>>
>>>> If the goal is clarity of the operation, how about 'REGISTER EXTERNAL DATA
>>>> SOURCE' and 'REGISTER EXTERNAL DATA SOURCE PROVIDER'? Those names
>>>> remove the ambiguity around the operation creating and the data source
>>>> being a table.
>>>>
>>>> Andrew
>>>>
>>>> On Wed, Aug 15, 2018 at 10:54 AM Anton Kedin <[email protected]> wrote:
>>>>
>>>>> My preference is to make `EXTERNAL` mandatory and only support `CREATE
>>>>> EXTERNAL TABLE` for existing semantics. My main reasons are:
>>>>>  - user friendliness, matching expectations, readability. Current
>>>>> `CREATE TABLE` is basically a `CREATE EXTERNAL TABLE`. It is confusing to
>>>>> users familiar with SQL who expect that `CREATE TABLE` will actually 
>>>>> create
>>>>> a table;
>>>>>  - forward-compatibility. We could potentially support non-external
>>>>> `CREATE TABLE` at some point in the future, whatever semantics it might
>>>>> have. It will be wrong to use the same syntax for external and 
>>>>> non-external
>>>>> CREATEs;
>>>>>
>>>>> I agree that typing extra word each time is not ideal, but my opinion
>>>>> is on the side that readability of code (including SQL) is important (how
>>>>> much time you spend reading / understanding code vs writing it) and we
>>>>> should try to improve it if we can. In case of DDL every non-trivial
>>>>> statement will already have a ton of unavoidable words (field names, 
>>>>> types,
>>>>> location, options) so I would argue that adding extra one word would not
>>>>> noticeably reduce your happiness of writing it :) But it would improve
>>>>> readability and reduce ambiguity, which I think is worth it.
>>>>>
>>>>> I think that making it optional only introduces more confusion (e.g.
>>>>> what's the difference between the two DDL statements without reading the
>>>>> doc?) and would make situation worse.
>>>>>
>>>>> Regards,
>>>>> Anton
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 15, 2018 at 10:24 AM Mingmin Xu <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I prefer to `CREATE EXTERNAL TABLE`. My question is, do you plan to
>>>>>> support both `CREATE TABLE` and `CREATE EXTERNAL TABLE`, by making
>>>>>> `EXTERNAL` as optional?
>>>>>>
>>>>>> On Wed, Aug 15, 2018 at 10:01 AM, Andrew Pilloud <[email protected]
>>>>>> > wrote:
>>>>>>
>>>>>>> I think 'CREATE EXTERNAL TABLE' might make things a bit clearer from
>>>>>>> a documentation prospective, but I'd be really unhappy if I had to type 
>>>>>>> out
>>>>>>> 'EXTERNAL' every time. (I have the same concern with 'CREATE EXTERNAL
>>>>>>> SCHEMA'.)
>>>>>>>
>>>>>>> Andrew
>>>>>>>
>>>>>>> On Tue, Aug 14, 2018 at 12:38 PM Rui Wang <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi guys,
>>>>>>>>
>>>>>>>> I know you are probably using CREATE TABLE, Can I know your
>>>>>>>> thoughts on this?
>>>>>>>>
>>>>>>>> -Rui
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 14, 2018 at 10:22 AM Rui Wang <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Mikhail! "Import" is an alternative option. It might be
>>>>>>>>> better.
>>>>>>>>>
>>>>>>>>> "create external" is being widely used by different systems with
>>>>>>>>> similar meaning so "create" usually is ok to external data sources.
>>>>>>>>>
>>>>>>>>> -Rui
>>>>>>>>>
>>>>>>>>> On Tue, Aug 14, 2018 at 9:38 AM Mikhail Gryzykhin <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> The idea of clarification sounds good to me. I'd appreciate that
>>>>>>>>>> present, when I was triaging post-commit tests.
>>>>>>>>>>
>>>>>>>>>> Do we have any terms that specify connection to external table?
>>>>>>>>>> "CREATE" word triggers this reaction in my brain that there will be 
>>>>>>>>>> a new
>>>>>>>>>> table created. Adding "EXTERNAL" would already add distinction, but 
>>>>>>>>>> adding
>>>>>>>>>> something more explicit for the task might be even better.
>>>>>>>>>>
>>>>>>>>>> --Mikhail
>>>>>>>>>>
>>>>>>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Strictly speaking, they are not necessarily tables either. We
>>>>>>>>>>> could also introduce something like CREATE EXTERNAL DATA SOURCE 
>>>>>>>>>>> (a-la
>>>>>>>>>>> T-SQL
>>>>>>>>>>> <https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017>),
>>>>>>>>>>> if it's somehow advantageous for us to leverage access patterns or 
>>>>>>>>>>> restrict
>>>>>>>>>>> DML statements.
>>>>>>>>>>>
>>>>>>>>>>> I think your idea of CREATE EXTERNAL TABLE is practical :)
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Community,
>>>>>>>>>>>>
>>>>>>>>>>>> BeamSQL allows CREATE TABLE
>>>>>>>>>>>> <https://beam.apache.org/documentation/dsls/sql/create-table/>
>>>>>>>>>>>> statements to register virtual tables from external storage 
>>>>>>>>>>>> systems (e.g.
>>>>>>>>>>>> BigQuery).
>>>>>>>>>>>>
>>>>>>>>>>>> BeamSQL is not a storage system, so any table registered by
>>>>>>>>>>>> "CREATE TABLE" statement is essentially equivalent to be 
>>>>>>>>>>>> registered by
>>>>>>>>>>>> "CREATE EXTERNAL TABLE", which requires the user to provide a 
>>>>>>>>>>>> LOCATION and
>>>>>>>>>>>> BeamSQL will register the table outside of current execution 
>>>>>>>>>>>> environment
>>>>>>>>>>>> based on LOCATION.
>>>>>>>>>>>>
>>>>>>>>>>>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in
>>>>>>>>>>>> BeamSQL to help users understand they are registering tables, and 
>>>>>>>>>>>> BeamSQL
>>>>>>>>>>>> does not create non existing tables by running CREATE TABLE (at 
>>>>>>>>>>>> least on
>>>>>>>>>>>> some storage systems, if not all).
>>>>>>>>>>>>
>>>>>>>>>>>> We can make the EXTERNAL keyword either required or optional.
>>>>>>>>>>>>
>>>>>>>>>>>> If we make the EXTERNAL keyword required:
>>>>>>>>>>>>
>>>>>>>>>>>> Pros:
>>>>>>>>>>>> a. We can get rid of the registering table semantic on CREATE
>>>>>>>>>>>> TABLE.
>>>>>>>>>>>> b, We keep the room that we could add CREATE TABLE back in the
>>>>>>>>>>>> future if we want CREATE TABLE to create, rather than not only 
>>>>>>>>>>>> register
>>>>>>>>>>>> tables in BeamSQL.
>>>>>>>>>>>>
>>>>>>>>>>>> Cons:
>>>>>>>>>>>> 1. CREATE TABLE syntax will not be supported so existing
>>>>>>>>>>>> BeamSQL pipelines which has CREATE TABLE require changes.
>>>>>>>>>>>> 2. It's required to type tedious EXTERNAL keyword every time,
>>>>>>>>>>>> especially in SQL Shell.
>>>>>>>>>>>>
>>>>>>>>>>>> If we make the EXTERNAL keyword optional, we will have reversed
>>>>>>>>>>>> pros and cons above.
>>>>>>>>>>>>
>>>>>>>>>>>> Any thoughts on adding EXTERNAL keyword, and make it required
>>>>>>>>>>>> or optional?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Rui
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ----
>>>>>> Mingmin
>>>>>>
>>>>>

Reply via email to