[ 
https://issues.apache.org/jira/browse/ASTERIXDB-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glenn Justo Galvizo updated ASTERIXDB-3059:
-------------------------------------------
    Attachment:     (was: SelectClause-1.png)

> EXCEPT in SelectClause
> ----------------------
>
>                 Key: ASTERIXDB-3059
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3059
>             Project: Apache AsterixDB
>          Issue Type: New Feature
>          Components: SQL - Translator SQL++
>            Reporter: Glenn Justo Galvizo
>            Assignee: Glenn Justo Galvizo
>            Priority: Major
>         Attachments: Projection.png, SelectClause.png
>
>
> It would be nice to have an option to exclude certain fields when using 
> SELECT *. 
> [BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]
>  gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2, 
> ...'. 'EXCEPT' seems like a nice addition to our grammar, here are a few 
> examples on how we can extend our SELECT clause to include 'EXCEPT':
>  
> Let's start off with some DDLs and some data:
> {code:sql}
> CREATE TYPE     GenericType AS { _id: uuid };
> CREATE DATASET  Users (GenericType) PRIMARY KEY _id AUTOGENERATED;
> CREATE DATASET  Friends (GenericType) PRIMARY KEY _id AUTOGENERATED;
> INSERT INTO     Users [
>   { "user_id": 1, "name": "Sally",
>     "address": { "zip_code": "92512", "state": "CA" } },
>   { "user_id": 2, "name": "Mark" },
>   { "user_id": 3, "name": "John" },
>   { "user_id": 4, "name": "Tracy", "title": "Dr" }
> ];
> INSERT INTO     Friends [
>   { "user_id": 1, "friend": 2 },
>   { "user_id": 1, "friend": 3 },
>   { "user_id": 2, "friend": 1 },
>   { "user_id": 3, "friend": 1 }
> ];{code}
>  
> ----
>  
> Now we can get into the examples:
> 1. Get me all users U1 that have friends. Do not include the F dataset. In 
> the example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In 
> contrast to BigQuery, we do not specify columns / fields in our exclusion 
> list here, instead we specify an exclusion of _variables_ that would have 
> otherwise been returned with our 'SELECT' clause.
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id
> SELECT * EXCEPT F;{code}
> Returns the following (Sally appears twice because she has two friends):
> {code:java}
> > { "U1":  { "user_id": 1, "name": "Sally",
>              "address": { "zip_code": "92512", "state": "CA" } } }
> > { "U1": { "user_id": 1, "name": "Sally", 
>             "address": { "zip_code": "92512", "state": "CA" } } }
> > { "U1": { "user_id": 2, "name": "Mark" } }
> > { "U1": { "user_id": 3, "name": "John" } }{code}
> 2. Get me all unique users U1 that have friends. Do not include that user's 
> address and title. The example below more closely aligns with BigQuery, as we 
> specify columns / fields that we want to exclude from U1 in the exclusion 
> list 'address' and 'title'.
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id 
> SELECT DISTINCT U1.* EXCEPT address, title;{code}
> Returns the following:
> {code:java}
> > { "user_id": 1, "name": "Sally" }
> > { "user_id": 2, "name": "Mark" }
> > { "user_id": 3, "name": "John" }{code}
> 3. If we want to exclude some field in a nested object, we can specify the 
> fields within our object with the '.' syntax. Say we don't want to include 
> the zipcode of an address object within a user that has friends. We would use 
> the query below:
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id
> SELECT DISTINCT U1.* EXCEPT address.zip_code;{code}
> Returns the following:
> {code:java}
> > { "user_id": 1, "name": "Sally", "address": { "state": "CA" } }
> > { "user_id": 2, "name": "Mark" }
> > { "user_id": 3, "name": "John" }{code}
> 4. Now suppose we want users and the user documents of their immediate 
> friends, excluding addresses from both user documents. We would use the 
> following query:
> {code:sql}
> FROM   Users U1, Friends F, Users U2
> WHERE  U1.user_id = F.user_id AND
>        U2.user_id = F.friend
> SELECT U1, U2 EXCEPT U1.address, U2.address;{code}
> Returns the following:
> {code:java}
> > { "U1": { "user_id": 1, "name": "Sally" }, 
>     "U2": { "user_id": 2, "name": "Mark" } }
> > { "U1": { "user_id": 1, "name": "Sally" }, 
>     "U2": { "user_id": 3, "name": "John" } }
> > { "U1": { "user_id": 2, "name": "Mark" }, 
>     "U2": { "user_id": 1, "name": "Sally" } }
> > { "U1": { "user_id": 3, "name": "John" }, 
>     "U2": { "user_id": 1, "name": "Sally" } }{code}
>  
> ----
>  
> Now let's see some "problematic" queries:
> 1. U3 is not defined anywhere in the query, but the EXCEPT sees U3 as a 
> potential field from the SELECT clause. For now we are going to execute our 
> query without warnings, but it would be nice to raise a warning here in the 
> future (otherwise, we would have to implement some form of type analysis into 
> the lang module).
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id
> SELECT U1 EXCEPT U3;{code}
> 2. Similar to 1, our EXCEPT clause is extraneous. F is not visible (using 
> EXCEPT sees what the SELECT returns semantics). We are going to execute our 
> query without warnings, but it would be nice to raise a warning here in the 
> future:
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id
> SELECT U1 EXCEPT F;{code}
> 3. Looking at our data, we do not have any documents that have a ssn field. 
> But, we (currently) cannot infer the absence of an ssn field, so the query 
> below should run without warnings / errors.
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id
> SELECT U1 EXCEPT U1.ssn;{code}
> 4. (Raised by Don Chamberlin): In all of the previous examples, we could 
> connect each field / nested field in the EXCEPT list using the first 
> identifier in the field / nested field itself to some identifier in the 
> SELECT list. With SELECT VALUE, we have no such identifier. Hence, we do not 
> allow SELECT VALUE and EXCEPT to exist together to make it clearer what each 
> field in our EXCEPT list refers to.
> {code:sql}
> FROM   Users U1
> SELECT VALUE U1.address EXCEPT zip_code;{code}
>  
> ----
> The exact grammar I have in mind is given below, where we simply add the 
> 'EXCEPT' at the tail of our 'SELECT' clause. Semantically, the 'EXCEPT' 
> clause surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS. 
> Using #2 as an example, a logical equivalent would be:
> {code:sql}
> FROM   (
>   FROM   Users U1, Friends F
>   WHERE  U1.user_id = F.user_id
>   SELECT DISTINCT U1.* 
> ) TMP
> SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to