+1 for CREATE EXTERNAL TABLE with similar reasoning given by others on this 
thread.

Tim

> On 15 Aug 2018, at 23:01, Charles Chen <[email protected]> wrote:
> 
> +1 for CREATE EXTERNAL TABLE.  It is a good balance between the general SQL 
> expectation of having tables as an abstraction and reinforcing that Beam does 
> not store your data.
> 
>> On Wed, Aug 15, 2018 at 1:58 PM Rui Wang <[email protected]> wrote:
>> >  I think users will be more confused to find that 'CREATE TABLE' doesn't 
>> > exist then to learn that it might not always create a table. 
>> 
>> >> I think that having CREATE TABLE do something unexpected or not do 
>> >> something expected (or do the opposite things depending on the table type 
>> >> or some flag) is worse than having users look up the correct way of 
>> >> creating a data source in Beam SQL without expecting something we don't 
>> >> promise.
>> 
>> I agree on this. Enforcing users to look up documentation for the correct 
>> way is better than letting them use an ambiguous way that could fail their 
>> expectation.
>> 
>> 
>> -Rui
>> 
>>> On Wed, Aug 15, 2018 at 1:46 PM Anton Kedin <[email protected]> wrote:
>>> I think that something unique along the lines of `REGISTER EXTERNAL DATA 
>>> SOURCE` is probably fine, as it doesn't conflict with existing behaviors of 
>>> other dialects.
>>> 
>>> > There is a lot of value in making sure our common operations closely map 
>>> > to the equivalent common operations in other SQL dialects. 
>>> 
>>> We're trying to make opposite points using the same arguments :) A lot of 
>>> popular dialects make difference between CREATE TABLE and CREATE EXTERNAL 
>>> TABLE (or similar):
>>>  - T-SQL:
>>>       create: 
>>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql
>>>       create external: 
>>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
>>>       external datasource: 
>>> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
>>>  - PL/SQL:
>>>       create: 
>>> https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables003.htm#i1106369
>>>       create external: 
>>> https://docs.oracle.com/cd/B19306_01/server.102/b14215/et_concepts.htm#i1009127
>>>  - postgres:
>>>       import foreign schema: 
>>> https://www.postgresql.org/docs/9.5/static/sql-importforeignschema.html
>>>       create table: 
>>> https://www.postgresql.org/docs/9.1/static/sql-createtable.html
>>>  - redshift:
>>>       create external schema: 
>>> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html
>>>       create table: 
>>> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html
>>>  - hive internal and external: 
>>> https://www.dezyre.com/hadoop-tutorial/apache-hive-tutorial-tables
>>> 
>>> My understanding is that the behavior of create table is somewhat similar 
>>> in all of the above dialects, from the high-level perspective it usually 
>>> creates a persistent table in the current storage context (database). 
>>> That's not what Beam SQL's create table does right now, and my opinion is 
>>> that it should not be called create table for this reason.
>>> 
>>> >  I think users will be more confused to find that 'CREATE TABLE' doesn't 
>>> > exist then to learn that it might not always create a table. 
>>> 
>>> I think that having CREATE TABLE do something unexpected or not do 
>>> something expected (or do the opposite things depending on the table type 
>>> or some flag) is worse than having users look up the correct way of 
>>> creating a data source in Beam SQL without expecting something we don't 
>>> promise.
>>> 
>>> >  (For example, a user guessing at the syntax of CREATE TABLE would have a 
>>> > better experience with the error being "field LOCATION not specified" 
>>> > rather than "operation CREATE TABLE not found".)
>>> 
>>> They have to look it up anyway (what format is location for a Pubsub topic? 
>>> or is it a subscription?), and when doing so I think it would be less 
>>> confusing to read that to get data from Pubsub/Kafka/... in Beam SQL you 
>>> have to do something like `REGISTER EXTERNAL DATA SOURCE` than `CREATE 
>>> TABLE`.
>>> 
>>> External tables and schemas don't have a standard approach and I don't have 
>>> a strong preference between any one from the above. 
>>> 
>>>> On Wed, Aug 15, 2018 at 1:08 PM Rui Wang <[email protected]> wrote:
>>>> Adding dev@ back now.
>>>> 
>>>> -Rui
>>>> 
>>>>> On Wed, Aug 15, 2018 at 1:01 PM Andrew Pilloud <[email protected]> 
>>>>> wrote:
>>>>> Did we drop the dev list from this on purpose? (I haven't added it back, 
>>>>> but we probably should.)
>>>>> 
>>>>> I'm in favor of sticking with the simple 'CREATE TABLE' and 'CREATE 
>>>>> SCHEMA' if there is only to be one option. Sticking with those names 
>>>>> minimizes both our deviation from other implementations and user 
>>>>> surprise. There is a lot of value in making sure our common operations 
>>>>> closely map to the equivalent common operations in other SQL dialects. I 
>>>>> think users will be more confused to find that 'CREATE TABLE' doesn't 
>>>>> exist then to learn that it might not always create a table. This 
>>>>> minimizes the overhead of learning our dialect of SQL and maximizes the 
>>>>> odds that a user will be able to guess at the syntax of something and 
>>>>> have it work. (For example, a user guessing at the syntax of CREATE TABLE 
>>>>> would have a better experience with the error being "field LOCATION not 
>>>>> specified" rather than "operation CREATE TABLE not found".)
>>>>> 
>>>>> If the goal is clarity of the operation, how about 'REGISTER EXTERNAL 
>>>>> DATA SOURCE' and 'REGISTER EXTERNAL DATA SOURCE PROVIDER'? Those names 
>>>>> remove the ambiguity around the operation creating and the data source 
>>>>> being a table.
>>>>> 
>>>>> Andrew
>>>>> 
>>>>>> On Wed, Aug 15, 2018 at 10:54 AM Anton Kedin <[email protected]> wrote:
>>>>>> My preference is to make `EXTERNAL` mandatory and only support `CREATE 
>>>>>> EXTERNAL TABLE` for existing semantics. My main reasons are:
>>>>>>  - user friendliness, matching expectations, readability. Current 
>>>>>> `CREATE TABLE` is basically a `CREATE EXTERNAL TABLE`. It is confusing 
>>>>>> to users familiar with SQL who expect that `CREATE TABLE` will actually 
>>>>>> create a table;
>>>>>>  - forward-compatibility. We could potentially support non-external 
>>>>>> `CREATE TABLE` at some point in the future, whatever semantics it might 
>>>>>> have. It will be wrong to use the same syntax for external and 
>>>>>> non-external CREATEs;
>>>>>> 
>>>>>> I agree that typing extra word each time is not ideal, but my opinion is 
>>>>>> on the side that readability of code (including SQL) is important (how 
>>>>>> much time you spend reading / understanding code vs writing it) and we 
>>>>>> should try to improve it if we can. In case of DDL every non-trivial 
>>>>>> statement will already have a ton of unavoidable words (field names, 
>>>>>> types, location, options) so I would argue that adding extra one word 
>>>>>> would not noticeably reduce your happiness of writing it :) But it would 
>>>>>> improve readability and reduce ambiguity, which I think is worth it.
>>>>>> 
>>>>>> I think that making it optional only introduces more confusion (e.g. 
>>>>>> what's the difference between the two DDL statements without reading the 
>>>>>> doc?) and would make situation worse.
>>>>>> 
>>>>>> Regards,
>>>>>> Anton
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Wed, Aug 15, 2018 at 10:24 AM Mingmin Xu <[email protected]> wrote:
>>>>>>> I prefer to `CREATE EXTERNAL TABLE`. My question is, do you plan to 
>>>>>>> support both `CREATE TABLE` and `CREATE EXTERNAL TABLE`, by making 
>>>>>>> `EXTERNAL` as optional?  
>>>>>>> 
>>>>>>>> On Wed, Aug 15, 2018 at 10:01 AM, Andrew Pilloud <[email protected]> 
>>>>>>>> wrote:
>>>>>>>> I think 'CREATE EXTERNAL TABLE' might make things a bit clearer from a 
>>>>>>>> documentation prospective, but I'd be really unhappy if I had to type 
>>>>>>>> out 'EXTERNAL' every time. (I have the same concern with 'CREATE 
>>>>>>>> EXTERNAL SCHEMA'.)
>>>>>>>> 
>>>>>>>> Andrew
>>>>>>>> 
>>>>>>>>> On Tue, Aug 14, 2018 at 12:38 PM Rui Wang <[email protected]> wrote:
>>>>>>>>> Hi guys,
>>>>>>>>> 
>>>>>>>>> I know you are probably using CREATE TABLE, Can I know your thoughts 
>>>>>>>>> on this?
>>>>>>>>> 
>>>>>>>>> -Rui
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Tue, Aug 14, 2018 at 10:22 AM Rui Wang <[email protected]> wrote:
>>>>>>>>>> Thanks Mikhail! "Import" is an alternative option. It might be 
>>>>>>>>>> better.
>>>>>>>>>> 
>>>>>>>>>> "create external" is being widely used by different systems with 
>>>>>>>>>> similar meaning so "create" usually is ok to external data sources.
>>>>>>>>>> 
>>>>>>>>>> -Rui
>>>>>>>>>> 
>>>>>>>>>>> On Tue, Aug 14, 2018 at 9:38 AM Mikhail Gryzykhin 
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>> The idea of clarification sounds good to me. I'd appreciate that 
>>>>>>>>>>> present, when I was triaging post-commit tests.
>>>>>>>>>>> 
>>>>>>>>>>> Do we have any terms that specify connection to external table? 
>>>>>>>>>>> "CREATE" word triggers this reaction in my brain that there will be 
>>>>>>>>>>> a new table created. Adding "EXTERNAL" would already add 
>>>>>>>>>>> distinction, but adding something more explicit for the task might 
>>>>>>>>>>> be even better.
>>>>>>>>>>> 
>>>>>>>>>>> --Mikhail
>>>>>>>>>>> 
>>>>>>>>>>> Have feedback? 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Aug 13, 2018 at 2:40 PM Rafael Fernandez 
>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>> Strictly speaking, they are not necessarily tables either. We 
>>>>>>>>>>>> could also introduce something like CREATE EXTERNAL DATA SOURCE 
>>>>>>>>>>>> (a-la T-SQL), if it's somehow advantageous for us to leverage 
>>>>>>>>>>>> access patterns or restrict DML statements.
>>>>>>>>>>>> 
>>>>>>>>>>>> I think your idea of CREATE EXTERNAL TABLE is practical :)
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Aug 13, 2018 at 2:12 PM Rui Wang <[email protected]> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi Community,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> BeamSQL allows CREATE TABLE statements to register virtual tables 
>>>>>>>>>>>>> from external storage systems (e.g. BigQuery). 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> BeamSQL is not a storage system, so any table registered by 
>>>>>>>>>>>>> "CREATE TABLE" statement is essentially equivalent to be 
>>>>>>>>>>>>> registered by "CREATE EXTERNAL TABLE", which requires the user to 
>>>>>>>>>>>>> provide a LOCATION and BeamSQL will register the table outside of 
>>>>>>>>>>>>> current execution environment based on LOCATION.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So I propose to add EXTERNAL keyword to "CREATE TABLE" in BeamSQL 
>>>>>>>>>>>>> to help users understand they are registering tables, and BeamSQL 
>>>>>>>>>>>>> does not create non existing tables by running CREATE TABLE (at 
>>>>>>>>>>>>> least on some storage systems, if not all). 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We can make the EXTERNAL keyword either required or optional.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If we make the EXTERNAL keyword required:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Pros:
>>>>>>>>>>>>> a. We can get rid of the registering table semantic on CREATE 
>>>>>>>>>>>>> TABLE. 
>>>>>>>>>>>>> b, We keep the room that we could add CREATE TABLE back in the 
>>>>>>>>>>>>> future if we want CREATE TABLE to create, rather than not only 
>>>>>>>>>>>>> register tables in BeamSQL. 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cons:
>>>>>>>>>>>>> 1. CREATE TABLE syntax will not be supported so existing BeamSQL 
>>>>>>>>>>>>> pipelines which has CREATE TABLE require changes.
>>>>>>>>>>>>> 2. It's required to type tedious EXTERNAL keyword every time, 
>>>>>>>>>>>>> especially in SQL Shell.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If we make the EXTERNAL keyword optional, we will have reversed 
>>>>>>>>>>>>> pros and cons above.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Any thoughts on adding EXTERNAL keyword, and make it required or 
>>>>>>>>>>>>> optional?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Rui
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> ----
>>>>>>> Mingmin

Reply via email to