[
https://issues.apache.org/jira/browse/ASTERIXDB-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Glenn Justo Galvizo updated ASTERIXDB-3059:
-------------------------------------------
Description:
It would be nice to have an option to exclude certain fields when using SELECT
*.
[BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]
gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2, ...'.
'EXCEPT' seems like a nice addition to our grammar, here are a few examples on
how we can extend our SELECT clause to include 'EXCEPT':
Let's start off with some DDLs and some data:
{code:sql}
CREATE TYPE GenericType AS { _id: uuid };
CREATE DATASET Users (GenericType) PRIMARY KEY _id AUTOGENERATED;
CREATE DATASET Friends (GenericType) PRIMARY KEY _id AUTOGENERATED;
INSERT INTO Users [
{ "user_id": 1, "name": "Sally",
"address": { "zip_code": "92512", "state": "CA" } },
{ "user_id": 2, "name": "Mark" },
{ "user_id": 3, "name": "John" },
{ "user_id": 4, "name": "Tracy", "title": "Dr" }
];
INSERT INTO Friends [
{ "user_id": 1, "friend": 2 },
{ "user_id": 1, "friend": 3 },
{ "user_id": 2, "friend": 1 },
{ "user_id": 3, "friend": 1 }
];{code}
----
Now we can get into the examples:
1. Get me all users U1 that have friends. Do not include the F dataset. In the
example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In contrast
to BigQuery, we do not specify columns / fields in our exclusion list here,
instead we specify an exclusion of _variables_ that would have otherwise been
returned with our 'SELECT' clause.
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT * EXCEPT F;{code}
Returns the following:
{code:java}
> { "U1": { "user_id": 1, "name": "Sally",
"address": { "zip_code": "92512", "state": "CA" } } }
> { "U1": { "user_id": 2, "name": "Mark" } }
> { "U1": { "user_id": 3, "name": "John" } }{code}
2. Get me all users U1 that have friends. Do not include that user's address
and title. The example below more closely aligns with BigQuery, as we specify
columns / fields that we want to exclude from U1 in the exclusion list
'address' and 'title'.
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT DISTINCT U1.* EXCEPT address, title;{code}
Returns the following:
{code:java}
> { "user_id": 1, "name": "Sally" }
> { "user_id": 2, "name": "Mark" }
> { "user_id": 3, "name": "John" }{code}
3. If we want to exclude some field in a nested object, we can specify the
fields within our object with the '.' syntax. Say we don't want to include the
zipcode of an address object within a user that has friends. We would use the
query below:
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT DISTINCT U1.* EXCEPT address.zip_code;{code}
Returns the following:
{code:java}
> { "user_id": 1, "name": "Sally", "address": { "state": "CA" } }
> { "user_id": 2, "name": "Mark" }
> { "user_id": 3, "name": "John" }{code}
4. Now suppose we want users and the user documents of their immediate friends,
excluding addresses from both user documents. We would use the following query:
{code:sql}
FROM Users U1, Friends F, Users U2
WHERE U1.user_id = F.user_id AND
U2.user_id = F.friend
SELECT U1, U2 EXCEPT U1.address, U2.address;{code}
Returns the following:
{code:java}
> { "U1": { "user_id": 1, "name": "Sally" },
"U2": { "user_id": 2, "name": "Mark" } }
> { "U1": { "user_id": 1, "name": "Sally" },
"U2": { "user_id": 3, "name": "John" } }
> { "U1": { "user_id": 2, "name": "Mark" },
"U2": { "user_id": 1, "name": "Sally" } }
> { "U1": { "user_id": 3, "name": "John" },
"U2": { "user_id": 1, "name": "Sally" } }{code}
----
The exact grammar I have in mind is given below, where we simply add the
'EXCEPT' at the tail of our 'SELECT' clause. Logically, the 'EXCEPT' clause
surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS. Using #2
as an example, a logical equivalent would be:
{code:sql}
FROM (
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT DISTINCT U1.*
) TMP
SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code}
was:
It would be nice to have an option to exclude certain fields when using SELECT
*.
[BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]
gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2, ...'.
'EXCEPT' seems like a nice addition to our grammar, here are a few examples on
how we can extend our SELECT clause to include 'EXCEPT':
Let's start off with some DDLs and some data:
{code:sql}
CREATE TYPE GenericType AS { _id: uuid };
CREATE DATASET Users (GenericType) PRIMARY KEY _id AUTOGENERATED;
CREATE DATASET Friends (GenericType) PRIMARY KEY _id AUTOGENERATED;
INSERT INTO Users [
{ "user_id": 1, "name": "Sally",
"address": { "zip_code": "92512", "state": "CA" } },
{ "user_id": 2, "name": "Mark" },
{ "user_id": 3, "name": "John" },
{ "user_id": 4, "name": "Tracy", "title": "Dr" }
];
INSERT INTO Friends [
{ "user_id": 1, "friend": 2 },
{ "user_id": 1, "friend": 3 },
{ "user_id": 2, "friend": 1 },
{ "user_id": 3, "friend": 1 }
];{code}
----
Now we can get into the examples:
1. Get me all users U1 that have friends. Do not include the F dataset. In the
example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In contrast
to BigQuery, we do not specify columns / fields in our exclusion list here,
instead we specify an exclusion of _variables_ that would have otherwise been
returned with our 'SELECT' clause.
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT * EXCEPT F;{code}
Returns the following:
{code:java}
> { "U1": { "user_id": 1, "name": "Sally",
"address": { "zip_code": "92512", "state": "CA" } } }
> { "U1": { "user_id": 2, "name": "Mark" } }
> { "U1": { "user_id": 3, "name": "John" } }{code}
2. Get me all users U1 that have friends. Do not include that user's address
and title. The example below more closely aligns with BigQuery, as we specify
columns / fields that we want to exclude from U1 in the exclusion list
'address' and 'title'.
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT DISTINCT U1.* EXCEPT address, title;{code}
Returns the following:
{code:java}
> { "user_id": 1, "name": "Sally" }
> { "user_id": 2, "name": "Mark" }
> { "user_id": 3, "name": "John" }{code}
3. If we want to exclude some field in a nested object, we can specify the
fields within our object with the '.' syntax. Say we don't want to include the
zipcode of an address object within a user that has friends. We would use the
query below:
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT DISTINCT U1.* EXCEPT address.zip_code;{code}
Returns the following:
{code:java}
> { "user_id": 1, "name": "Sally", "address": { "state": "CA" } }
> { "user_id": 2, "name": "Mark" }
> { "user_id": 3, "name": "John" }{code}
----
The exact grammar I have in mind is given below, where we simply add the
'EXCEPT' at the tail of our 'SELECT' clause. Logically, the 'EXCEPT' clause
surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS. Using #2
as an example, a logical equivalent would be:
{code:sql}
FROM (
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT DISTINCT U1.*
) TMP
SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code}
> EXCEPT in SelectClause
> ----------------------
>
> Key: ASTERIXDB-3059
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-3059
> Project: Apache AsterixDB
> Issue Type: New Feature
> Components: SQL - Translator SQL++
> Reporter: Glenn Justo Galvizo
> Assignee: Glenn Justo Galvizo
> Priority: Major
> Attachments: SelectClause-1.png
>
>
> It would be nice to have an option to exclude certain fields when using
> SELECT *.
> [BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]
> gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2,
> ...'. 'EXCEPT' seems like a nice addition to our grammar, here are a few
> examples on how we can extend our SELECT clause to include 'EXCEPT':
>
> Let's start off with some DDLs and some data:
> {code:sql}
> CREATE TYPE GenericType AS { _id: uuid };
> CREATE DATASET Users (GenericType) PRIMARY KEY _id AUTOGENERATED;
> CREATE DATASET Friends (GenericType) PRIMARY KEY _id AUTOGENERATED;
> INSERT INTO Users [
> { "user_id": 1, "name": "Sally",
> "address": { "zip_code": "92512", "state": "CA" } },
> { "user_id": 2, "name": "Mark" },
> { "user_id": 3, "name": "John" },
> { "user_id": 4, "name": "Tracy", "title": "Dr" }
> ];
> INSERT INTO Friends [
> { "user_id": 1, "friend": 2 },
> { "user_id": 1, "friend": 3 },
> { "user_id": 2, "friend": 1 },
> { "user_id": 3, "friend": 1 }
> ];{code}
> ----
> Now we can get into the examples:
> 1. Get me all users U1 that have friends. Do not include the F dataset. In
> the example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In
> contrast to BigQuery, we do not specify columns / fields in our exclusion
> list here, instead we specify an exclusion of _variables_ that would have
> otherwise been returned with our 'SELECT' clause.
> {code:sql}
> FROM Users U1, Friends F
> WHERE U1.user_id = F.user_id
> SELECT * EXCEPT F;{code}
> Returns the following:
> {code:java}
> > { "U1": { "user_id": 1, "name": "Sally",
> "address": { "zip_code": "92512", "state": "CA" } } }
> > { "U1": { "user_id": 2, "name": "Mark" } }
> > { "U1": { "user_id": 3, "name": "John" } }{code}
> 2. Get me all users U1 that have friends. Do not include that user's address
> and title. The example below more closely aligns with BigQuery, as we specify
> columns / fields that we want to exclude from U1 in the exclusion list
> 'address' and 'title'.
> {code:sql}
> FROM Users U1, Friends F
> WHERE U1.user_id = F.user_id
> SELECT DISTINCT U1.* EXCEPT address, title;{code}
> Returns the following:
> {code:java}
> > { "user_id": 1, "name": "Sally" }
> > { "user_id": 2, "name": "Mark" }
> > { "user_id": 3, "name": "John" }{code}
> 3. If we want to exclude some field in a nested object, we can specify the
> fields within our object with the '.' syntax. Say we don't want to include
> the zipcode of an address object within a user that has friends. We would use
> the query below:
> {code:sql}
> FROM Users U1, Friends F
> WHERE U1.user_id = F.user_id
> SELECT DISTINCT U1.* EXCEPT address.zip_code;{code}
> Returns the following:
> {code:java}
> > { "user_id": 1, "name": "Sally", "address": { "state": "CA" } }
> > { "user_id": 2, "name": "Mark" }
> > { "user_id": 3, "name": "John" }{code}
> 4. Now suppose we want users and the user documents of their immediate
> friends, excluding addresses from both user documents. We would use the
> following query:
> {code:sql}
> FROM Users U1, Friends F, Users U2
> WHERE U1.user_id = F.user_id AND
> U2.user_id = F.friend
> SELECT U1, U2 EXCEPT U1.address, U2.address;{code}
> Returns the following:
> {code:java}
> > { "U1": { "user_id": 1, "name": "Sally" },
> "U2": { "user_id": 2, "name": "Mark" } }
> > { "U1": { "user_id": 1, "name": "Sally" },
> "U2": { "user_id": 3, "name": "John" } }
> > { "U1": { "user_id": 2, "name": "Mark" },
> "U2": { "user_id": 1, "name": "Sally" } }
> > { "U1": { "user_id": 3, "name": "John" },
> "U2": { "user_id": 1, "name": "Sally" } }{code}
> ----
> The exact grammar I have in mind is given below, where we simply add the
> 'EXCEPT' at the tail of our 'SELECT' clause. Logically, the 'EXCEPT' clause
> surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS. Using
> #2 as an example, a logical equivalent would be:
> {code:sql}
> FROM (
> FROM Users U1, Friends F
> WHERE U1.user_id = F.user_id
> SELECT DISTINCT U1.*
> ) TMP
> SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)