[ 
https://issues.apache.org/jira/browse/ASTERIXDB-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glenn Justo Galvizo updated ASTERIXDB-3059:
-------------------------------------------
     Attachment: Projection.png
                 SelectClause.png
    Description: 
It would be nice to have an option to exclude certain fields when using SELECT 
*. 
[BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]
 gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2, ...'. 
'EXCEPT' seems like a nice addition to our grammar, here are a few examples on 
how we can extend our SELECT clause to include 'EXCEPT':

 

Let's start off with some DDLs and some data:
{code:sql}
CREATE TYPE     GenericType AS { _id: uuid };
CREATE DATASET  Users (GenericType) PRIMARY KEY _id AUTOGENERATED;
CREATE DATASET  Friends (GenericType) PRIMARY KEY _id AUTOGENERATED;

INSERT INTO     Users [
  { "user_id": 1, "name": "Sally",
    "address": { "zip_code": "92512", "state": "CA" } },
  { "user_id": 2, "name": "Mark" },
  { "user_id": 3, "name": "John" },
  { "user_id": 4, "name": "Tracy", "title": "Dr" }
];
INSERT INTO     Friends [
  { "user_id": 1, "friend": 2 },
  { "user_id": 1, "friend": 3 },
  { "user_id": 2, "friend": 1 },
  { "user_id": 3, "friend": 1 }
];{code}
 
----
 

Now we can get into the examples:

1. Get me all users U1 that have friends. Do not include the F dataset. In the 
example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In contrast 
to BigQuery, we do not specify columns / fields in our exclusion list here, 
instead we specify an exclusion of _variables_ that would have otherwise been 
returned with our 'SELECT' clause.
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id
SELECT * EXCEPT F;{code}
Returns the following (Sally appears twice because she has two friends):
{code:java}
> { "U1":  { "user_id": 1, "name": "Sally",
             "address": { "zip_code": "92512", "state": "CA" } } }
> { "U1": { "user_id": 1, "name": "Sally", 
            "address": { "zip_code": "92512", "state": "CA" } } }
> { "U1": { "user_id": 2, "name": "Mark" } }
> { "U1": { "user_id": 3, "name": "John" } }{code}
2. Get me all unique users U1 that have friends. Do not include that user's 
address and title. The example below more closely aligns with BigQuery, as we 
specify columns / fields that we want to exclude from U1 in the exclusion list 
'address' and 'title'.
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id 
SELECT DISTINCT U1.* EXCEPT address, title;{code}
Returns the following:
{code:java}
> { "user_id": 1, "name": "Sally" }
> { "user_id": 2, "name": "Mark" }
> { "user_id": 3, "name": "John" }{code}
3. If we want to exclude some field in a nested object, we can specify the 
fields within our object with the '.' syntax. Say we don't want to include the 
zipcode of an address object within a user that has friends. We would use the 
query below:
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id
SELECT DISTINCT U1.* EXCEPT address.zip_code;{code}
Returns the following:
{code:java}
> { "user_id": 1, "name": "Sally", "address": { "state": "CA" } }
> { "user_id": 2, "name": "Mark" }
> { "user_id": 3, "name": "John" }{code}
4. Now suppose we want users and the user documents of their immediate friends, 
excluding addresses from both user documents. We would use the following query:
{code:sql}
FROM   Users U1, Friends F, Users U2
WHERE  U1.user_id = F.user_id AND
       U2.user_id = F.friend
SELECT U1, U2 EXCEPT U1.address, U2.address;{code}
Returns the following:
{code:java}
> { "U1": { "user_id": 1, "name": "Sally" }, 
    "U2": { "user_id": 2, "name": "Mark" } }
> { "U1": { "user_id": 1, "name": "Sally" }, 
    "U2": { "user_id": 3, "name": "John" } }
> { "U1": { "user_id": 2, "name": "Mark" }, 
    "U2": { "user_id": 1, "name": "Sally" } }
> { "U1": { "user_id": 3, "name": "John" }, 
    "U2": { "user_id": 1, "name": "Sally" } }{code}
 
----
 

Now let's see some "problematic" queries:

1. U3 is not defined anywhere in the query, but the EXCEPT sees U3 as a 
potential field from the SELECT clause. For now we are going to execute our 
query without warnings, but it would be nice to raise a warning here in the 
future (otherwise, we would have to implement some form of type analysis into 
the lang module).
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id
SELECT U1 EXCEPT U3;{code}
2. Similar to 1, our EXCEPT clause is extraneous. F is not visible (using 
EXCEPT sees what the SELECT returns semantics). We are going to execute our 
query without warnings, but it would be nice to raise a warning here in the 
future:
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id
SELECT U1 EXCEPT F;{code}
3. Looking at our data, we do not have any documents that have a ssn field. 
But, we (currently) cannot infer the absence of an ssn field, so the query 
below should run without warnings / errors.
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id
SELECT U1 EXCEPT U1.ssn;{code}
4. (Raised by Don Chamberlin): In all of the previous examples, we could 
connect each field / nested field in the EXCEPT list using the first identifier 
in the field / nested field itself to some identifier in the SELECT list. With 
SELECT VALUE, we have no such identifier. Hence, we do not allow SELECT VALUE 
and EXCEPT to exist together to make it clearer what each field in our EXCEPT 
list refers to.
{code:sql}
FROM   Users U1
SELECT VALUE U1.address EXCEPT zip_code;{code}
 
----
The exact grammar I have in mind is given below, where we simply add the 
'EXCEPT' at the tail of our 'SELECT' clause. Semantically, the 'EXCEPT' clause 
surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS. Using #2 
as an example, a logical equivalent would be:
{code:sql}
FROM   (
  FROM   Users U1, Friends F
  WHERE  U1.user_id = F.user_id
  SELECT DISTINCT U1.* 
) TMP
SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code}

  was:
It would be nice to have an option to exclude certain fields when using SELECT 
*. 
[BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]
 gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2, ...'. 
'EXCEPT' seems like a nice addition to our grammar, here are a few examples on 
how we can extend our SELECT clause to include 'EXCEPT':

 

Let's start off with some DDLs and some data:
{code:sql}
CREATE TYPE     GenericType AS { _id: uuid };
CREATE DATASET  Users (GenericType) PRIMARY KEY _id AUTOGENERATED;
CREATE DATASET  Friends (GenericType) PRIMARY KEY _id AUTOGENERATED;

INSERT INTO     Users [
  { "user_id": 1, "name": "Sally",
    "address": { "zip_code": "92512", "state": "CA" } },
  { "user_id": 2, "name": "Mark" },
  { "user_id": 3, "name": "John" },
  { "user_id": 4, "name": "Tracy", "title": "Dr" }
];
INSERT INTO     Friends [
  { "user_id": 1, "friend": 2 },
  { "user_id": 1, "friend": 3 },
  { "user_id": 2, "friend": 1 },
  { "user_id": 3, "friend": 1 }
];{code}
 
----
 

Now we can get into the examples:

1. Get me all users U1 that have friends. Do not include the F dataset. In the 
example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In contrast 
to BigQuery, we do not specify columns / fields in our exclusion list here, 
instead we specify an exclusion of _variables_ that would have otherwise been 
returned with our 'SELECT' clause.
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id
SELECT * EXCEPT F;{code}
Returns the following (Sally appears twice because she has two friends):
{code:java}
> { "U1":  { "user_id": 1, "name": "Sally",
             "address": { "zip_code": "92512", "state": "CA" } } }
> { "U1": { "user_id": 1, "name": "Sally", 
            "address": { "zip_code": "92512", "state": "CA" } } }
> { "U1": { "user_id": 2, "name": "Mark" } }
> { "U1": { "user_id": 3, "name": "John" } }{code}
2. Get me all unique users U1 that have friends. Do not include that user's 
address and title. The example below more closely aligns with BigQuery, as we 
specify columns / fields that we want to exclude from U1 in the exclusion list 
'address' and 'title'.
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id 
SELECT DISTINCT U1.* EXCEPT address, title;{code}
Returns the following:
{code:java}
> { "user_id": 1, "name": "Sally" }
> { "user_id": 2, "name": "Mark" }
> { "user_id": 3, "name": "John" }{code}
3. If we want to exclude some field in a nested object, we can specify the 
fields within our object with the '.' syntax. Say we don't want to include the 
zipcode of an address object within a user that has friends. We would use the 
query below:
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id
SELECT DISTINCT U1.* EXCEPT address.zip_code;{code}
Returns the following:
{code:java}
> { "user_id": 1, "name": "Sally", "address": { "state": "CA" } }
> { "user_id": 2, "name": "Mark" }
> { "user_id": 3, "name": "John" }{code}
4. Now suppose we want users and the user documents of their immediate friends, 
excluding addresses from both user documents. We would use the following query:
{code:sql}
FROM   Users U1, Friends F, Users U2
WHERE  U1.user_id = F.user_id AND
       U2.user_id = F.friend
SELECT U1, U2 EXCEPT U1.address, U2.address;{code}
Returns the following:
{code:java}
> { "U1": { "user_id": 1, "name": "Sally" }, 
    "U2": { "user_id": 2, "name": "Mark" } }
> { "U1": { "user_id": 1, "name": "Sally" }, 
    "U2": { "user_id": 3, "name": "John" } }
> { "U1": { "user_id": 2, "name": "Mark" }, 
    "U2": { "user_id": 1, "name": "Sally" } }
> { "U1": { "user_id": 3, "name": "John" }, 
    "U2": { "user_id": 1, "name": "Sally" } }{code}
 
----
 

Now let's see some "problematic" queries:

1. U3 is not defined anywhere in the query, but the EXCEPT sees U3 as a 
potential field from the SELECT clause. For now we are going to execute our 
query without warnings, but it would be nice to raise a warning here in the 
future (otherwise, we would have to implement some form of type analysis into 
the lang module).
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id
SELECT U1 EXCEPT U3;{code}
2. Similar to 1, our EXCEPT clause is extraneous. F is not visible (using 
EXCEPT sees what the SELECT returns semantics). We are going to execute our 
query without warnings, but it would be nice to raise a warning here in the 
future:
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id
SELECT U1 EXCEPT F;{code}
3. Looking at our data, we do not have any documents that have a ssn field. 
But, we (currently) cannot infer the absence of an ssn field, so the query 
below should run without warnings / errors.
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id
SELECT U1 EXCEPT U1.ssn;{code}
 
----
The exact grammar I have in mind is given below, where we simply add the 
'EXCEPT' at the tail of our 'SELECT' clause. Semantically, the 'EXCEPT' clause 
surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS. Using #2 
as an example, a logical equivalent would be:
{code:sql}
FROM   (
  FROM   Users U1, Friends F
  WHERE  U1.user_id = F.user_id
  SELECT DISTINCT U1.* 
) TMP
SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code}


> EXCEPT in SelectClause
> ----------------------
>
>                 Key: ASTERIXDB-3059
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3059
>             Project: Apache AsterixDB
>          Issue Type: New Feature
>          Components: SQL - Translator SQL++
>            Reporter: Glenn Justo Galvizo
>            Assignee: Glenn Justo Galvizo
>            Priority: Major
>         Attachments: Projection.png, SelectClause.png
>
>
> It would be nice to have an option to exclude certain fields when using 
> SELECT *. 
> [BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]
>  gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2, 
> ...'. 'EXCEPT' seems like a nice addition to our grammar, here are a few 
> examples on how we can extend our SELECT clause to include 'EXCEPT':
>  
> Let's start off with some DDLs and some data:
> {code:sql}
> CREATE TYPE     GenericType AS { _id: uuid };
> CREATE DATASET  Users (GenericType) PRIMARY KEY _id AUTOGENERATED;
> CREATE DATASET  Friends (GenericType) PRIMARY KEY _id AUTOGENERATED;
> INSERT INTO     Users [
>   { "user_id": 1, "name": "Sally",
>     "address": { "zip_code": "92512", "state": "CA" } },
>   { "user_id": 2, "name": "Mark" },
>   { "user_id": 3, "name": "John" },
>   { "user_id": 4, "name": "Tracy", "title": "Dr" }
> ];
> INSERT INTO     Friends [
>   { "user_id": 1, "friend": 2 },
>   { "user_id": 1, "friend": 3 },
>   { "user_id": 2, "friend": 1 },
>   { "user_id": 3, "friend": 1 }
> ];{code}
>  
> ----
>  
> Now we can get into the examples:
> 1. Get me all users U1 that have friends. Do not include the F dataset. In 
> the example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In 
> contrast to BigQuery, we do not specify columns / fields in our exclusion 
> list here, instead we specify an exclusion of _variables_ that would have 
> otherwise been returned with our 'SELECT' clause.
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id
> SELECT * EXCEPT F;{code}
> Returns the following (Sally appears twice because she has two friends):
> {code:java}
> > { "U1":  { "user_id": 1, "name": "Sally",
>              "address": { "zip_code": "92512", "state": "CA" } } }
> > { "U1": { "user_id": 1, "name": "Sally", 
>             "address": { "zip_code": "92512", "state": "CA" } } }
> > { "U1": { "user_id": 2, "name": "Mark" } }
> > { "U1": { "user_id": 3, "name": "John" } }{code}
> 2. Get me all unique users U1 that have friends. Do not include that user's 
> address and title. The example below more closely aligns with BigQuery, as we 
> specify columns / fields that we want to exclude from U1 in the exclusion 
> list 'address' and 'title'.
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id 
> SELECT DISTINCT U1.* EXCEPT address, title;{code}
> Returns the following:
> {code:java}
> > { "user_id": 1, "name": "Sally" }
> > { "user_id": 2, "name": "Mark" }
> > { "user_id": 3, "name": "John" }{code}
> 3. If we want to exclude some field in a nested object, we can specify the 
> fields within our object with the '.' syntax. Say we don't want to include 
> the zipcode of an address object within a user that has friends. We would use 
> the query below:
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id
> SELECT DISTINCT U1.* EXCEPT address.zip_code;{code}
> Returns the following:
> {code:java}
> > { "user_id": 1, "name": "Sally", "address": { "state": "CA" } }
> > { "user_id": 2, "name": "Mark" }
> > { "user_id": 3, "name": "John" }{code}
> 4. Now suppose we want users and the user documents of their immediate 
> friends, excluding addresses from both user documents. We would use the 
> following query:
> {code:sql}
> FROM   Users U1, Friends F, Users U2
> WHERE  U1.user_id = F.user_id AND
>        U2.user_id = F.friend
> SELECT U1, U2 EXCEPT U1.address, U2.address;{code}
> Returns the following:
> {code:java}
> > { "U1": { "user_id": 1, "name": "Sally" }, 
>     "U2": { "user_id": 2, "name": "Mark" } }
> > { "U1": { "user_id": 1, "name": "Sally" }, 
>     "U2": { "user_id": 3, "name": "John" } }
> > { "U1": { "user_id": 2, "name": "Mark" }, 
>     "U2": { "user_id": 1, "name": "Sally" } }
> > { "U1": { "user_id": 3, "name": "John" }, 
>     "U2": { "user_id": 1, "name": "Sally" } }{code}
>  
> ----
>  
> Now let's see some "problematic" queries:
> 1. U3 is not defined anywhere in the query, but the EXCEPT sees U3 as a 
> potential field from the SELECT clause. For now we are going to execute our 
> query without warnings, but it would be nice to raise a warning here in the 
> future (otherwise, we would have to implement some form of type analysis into 
> the lang module).
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id
> SELECT U1 EXCEPT U3;{code}
> 2. Similar to 1, our EXCEPT clause is extraneous. F is not visible (using 
> EXCEPT sees what the SELECT returns semantics). We are going to execute our 
> query without warnings, but it would be nice to raise a warning here in the 
> future:
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id
> SELECT U1 EXCEPT F;{code}
> 3. Looking at our data, we do not have any documents that have a ssn field. 
> But, we (currently) cannot infer the absence of an ssn field, so the query 
> below should run without warnings / errors.
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id
> SELECT U1 EXCEPT U1.ssn;{code}
> 4. (Raised by Don Chamberlin): In all of the previous examples, we could 
> connect each field / nested field in the EXCEPT list using the first 
> identifier in the field / nested field itself to some identifier in the 
> SELECT list. With SELECT VALUE, we have no such identifier. Hence, we do not 
> allow SELECT VALUE and EXCEPT to exist together to make it clearer what each 
> field in our EXCEPT list refers to.
> {code:sql}
> FROM   Users U1
> SELECT VALUE U1.address EXCEPT zip_code;{code}
>  
> ----
> The exact grammar I have in mind is given below, where we simply add the 
> 'EXCEPT' at the tail of our 'SELECT' clause. Semantically, the 'EXCEPT' 
> clause surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS. 
> Using #2 as an example, a logical equivalent would be:
> {code:sql}
> FROM   (
>   FROM   Users U1, Friends F
>   WHERE  U1.user_id = F.user_id
>   SELECT DISTINCT U1.* 
> ) TMP
> SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to