[
https://issues.apache.org/jira/browse/ASTERIXDB-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Glenn Justo Galvizo updated ASTERIXDB-3059:
-------------------------------------------
Attachment: Projection.png
SelectClause.png
Description:
It would be nice to have an option to exclude certain fields when using SELECT
*.
[BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]
gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2, ...'.
'EXCEPT' seems like a nice addition to our grammar, here are a few examples on
how we can extend our SELECT clause to include 'EXCEPT':
Let's start off with some DDLs and some data:
{code:sql}
CREATE TYPE GenericType AS { _id: uuid };
CREATE DATASET Users (GenericType) PRIMARY KEY _id AUTOGENERATED;
CREATE DATASET Friends (GenericType) PRIMARY KEY _id AUTOGENERATED;
INSERT INTO Users [
{ "user_id": 1, "name": "Sally",
"address": { "zip_code": "92512", "state": "CA" } },
{ "user_id": 2, "name": "Mark" },
{ "user_id": 3, "name": "John" },
{ "user_id": 4, "name": "Tracy", "title": "Dr" }
];
INSERT INTO Friends [
{ "user_id": 1, "friend": 2 },
{ "user_id": 1, "friend": 3 },
{ "user_id": 2, "friend": 1 },
{ "user_id": 3, "friend": 1 }
];{code}
----
Now we can get into the examples:
1. Get me all users U1 that have friends. Do not include the F dataset. In the
example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In contrast
to BigQuery, we do not specify columns / fields in our exclusion list here,
instead we specify an exclusion of _variables_ that would have otherwise been
returned with our 'SELECT' clause.
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT * EXCEPT F;{code}
Returns the following (Sally appears twice because she has two friends):
{code:java}
> { "U1": { "user_id": 1, "name": "Sally",
"address": { "zip_code": "92512", "state": "CA" } } }
> { "U1": { "user_id": 1, "name": "Sally",
"address": { "zip_code": "92512", "state": "CA" } } }
> { "U1": { "user_id": 2, "name": "Mark" } }
> { "U1": { "user_id": 3, "name": "John" } }{code}
2. Get me all unique users U1 that have friends. Do not include that user's
address and title. The example below more closely aligns with BigQuery, as we
specify columns / fields that we want to exclude from U1 in the exclusion list
'address' and 'title'.
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT DISTINCT U1.* EXCEPT address, title;{code}
Returns the following:
{code:java}
> { "user_id": 1, "name": "Sally" }
> { "user_id": 2, "name": "Mark" }
> { "user_id": 3, "name": "John" }{code}
3. If we want to exclude some field in a nested object, we can specify the
fields within our object with the '.' syntax. Say we don't want to include the
zipcode of an address object within a user that has friends. We would use the
query below:
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT DISTINCT U1.* EXCEPT address.zip_code;{code}
Returns the following:
{code:java}
> { "user_id": 1, "name": "Sally", "address": { "state": "CA" } }
> { "user_id": 2, "name": "Mark" }
> { "user_id": 3, "name": "John" }{code}
4. Now suppose we want users and the user documents of their immediate friends,
excluding addresses from both user documents. We would use the following query:
{code:sql}
FROM Users U1, Friends F, Users U2
WHERE U1.user_id = F.user_id AND
U2.user_id = F.friend
SELECT U1, U2 EXCEPT U1.address, U2.address;{code}
Returns the following:
{code:java}
> { "U1": { "user_id": 1, "name": "Sally" },
"U2": { "user_id": 2, "name": "Mark" } }
> { "U1": { "user_id": 1, "name": "Sally" },
"U2": { "user_id": 3, "name": "John" } }
> { "U1": { "user_id": 2, "name": "Mark" },
"U2": { "user_id": 1, "name": "Sally" } }
> { "U1": { "user_id": 3, "name": "John" },
"U2": { "user_id": 1, "name": "Sally" } }{code}
----
Now let's see some "problematic" queries:
1. U3 is not defined anywhere in the query, but the EXCEPT sees U3 as a
potential field from the SELECT clause. For now we are going to execute our
query without warnings, but it would be nice to raise a warning here in the
future (otherwise, we would have to implement some form of type analysis into
the lang module).
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT U1 EXCEPT U3;{code}
2. Similar to 1, our EXCEPT clause is extraneous. F is not visible (using
EXCEPT sees what the SELECT returns semantics). We are going to execute our
query without warnings, but it would be nice to raise a warning here in the
future:
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT U1 EXCEPT F;{code}
3. Looking at our data, we do not have any documents that have a ssn field.
But, we (currently) cannot infer the absence of an ssn field, so the query
below should run without warnings / errors.
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT U1 EXCEPT U1.ssn;{code}
4. (Raised by Don Chamberlin): In all of the previous examples, we could
connect each field / nested field in the EXCEPT list using the first identifier
in the field / nested field itself to some identifier in the SELECT list. With
SELECT VALUE, we have no such identifier. Hence, we do not allow SELECT VALUE
and EXCEPT to exist together to make it clearer what each field in our EXCEPT
list refers to.
{code:sql}
FROM Users U1
SELECT VALUE U1.address EXCEPT zip_code;{code}
----
The exact grammar I have in mind is given below, where we simply add the
'EXCEPT' at the tail of our 'SELECT' clause. Semantically, the 'EXCEPT' clause
surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS. Using #2
as an example, a logical equivalent would be:
{code:sql}
FROM (
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT DISTINCT U1.*
) TMP
SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code}
was:
It would be nice to have an option to exclude certain fields when using SELECT
*.
[BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]
gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2, ...'.
'EXCEPT' seems like a nice addition to our grammar, here are a few examples on
how we can extend our SELECT clause to include 'EXCEPT':
Let's start off with some DDLs and some data:
{code:sql}
CREATE TYPE GenericType AS { _id: uuid };
CREATE DATASET Users (GenericType) PRIMARY KEY _id AUTOGENERATED;
CREATE DATASET Friends (GenericType) PRIMARY KEY _id AUTOGENERATED;
INSERT INTO Users [
{ "user_id": 1, "name": "Sally",
"address": { "zip_code": "92512", "state": "CA" } },
{ "user_id": 2, "name": "Mark" },
{ "user_id": 3, "name": "John" },
{ "user_id": 4, "name": "Tracy", "title": "Dr" }
];
INSERT INTO Friends [
{ "user_id": 1, "friend": 2 },
{ "user_id": 1, "friend": 3 },
{ "user_id": 2, "friend": 1 },
{ "user_id": 3, "friend": 1 }
];{code}
----
Now we can get into the examples:
1. Get me all users U1 that have friends. Do not include the F dataset. In the
example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In contrast
to BigQuery, we do not specify columns / fields in our exclusion list here,
instead we specify an exclusion of _variables_ that would have otherwise been
returned with our 'SELECT' clause.
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT * EXCEPT F;{code}
Returns the following (Sally appears twice because she has two friends):
{code:java}
> { "U1": { "user_id": 1, "name": "Sally",
"address": { "zip_code": "92512", "state": "CA" } } }
> { "U1": { "user_id": 1, "name": "Sally",
"address": { "zip_code": "92512", "state": "CA" } } }
> { "U1": { "user_id": 2, "name": "Mark" } }
> { "U1": { "user_id": 3, "name": "John" } }{code}
2. Get me all unique users U1 that have friends. Do not include that user's
address and title. The example below more closely aligns with BigQuery, as we
specify columns / fields that we want to exclude from U1 in the exclusion list
'address' and 'title'.
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT DISTINCT U1.* EXCEPT address, title;{code}
Returns the following:
{code:java}
> { "user_id": 1, "name": "Sally" }
> { "user_id": 2, "name": "Mark" }
> { "user_id": 3, "name": "John" }{code}
3. If we want to exclude some field in a nested object, we can specify the
fields within our object with the '.' syntax. Say we don't want to include the
zipcode of an address object within a user that has friends. We would use the
query below:
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT DISTINCT U1.* EXCEPT address.zip_code;{code}
Returns the following:
{code:java}
> { "user_id": 1, "name": "Sally", "address": { "state": "CA" } }
> { "user_id": 2, "name": "Mark" }
> { "user_id": 3, "name": "John" }{code}
4. Now suppose we want users and the user documents of their immediate friends,
excluding addresses from both user documents. We would use the following query:
{code:sql}
FROM Users U1, Friends F, Users U2
WHERE U1.user_id = F.user_id AND
U2.user_id = F.friend
SELECT U1, U2 EXCEPT U1.address, U2.address;{code}
Returns the following:
{code:java}
> { "U1": { "user_id": 1, "name": "Sally" },
"U2": { "user_id": 2, "name": "Mark" } }
> { "U1": { "user_id": 1, "name": "Sally" },
"U2": { "user_id": 3, "name": "John" } }
> { "U1": { "user_id": 2, "name": "Mark" },
"U2": { "user_id": 1, "name": "Sally" } }
> { "U1": { "user_id": 3, "name": "John" },
"U2": { "user_id": 1, "name": "Sally" } }{code}
----
Now let's see some "problematic" queries:
1. U3 is not defined anywhere in the query, but the EXCEPT sees U3 as a
potential field from the SELECT clause. For now we are going to execute our
query without warnings, but it would be nice to raise a warning here in the
future (otherwise, we would have to implement some form of type analysis into
the lang module).
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT U1 EXCEPT U3;{code}
2. Similar to 1, our EXCEPT clause is extraneous. F is not visible (using
EXCEPT sees what the SELECT returns semantics). We are going to execute our
query without warnings, but it would be nice to raise a warning here in the
future:
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT U1 EXCEPT F;{code}
3. Looking at our data, we do not have any documents that have a ssn field.
But, we (currently) cannot infer the absence of an ssn field, so the query
below should run without warnings / errors.
{code:sql}
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT U1 EXCEPT U1.ssn;{code}
----
The exact grammar I have in mind is given below, where we simply add the
'EXCEPT' at the tail of our 'SELECT' clause. Semantically, the 'EXCEPT' clause
surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS. Using #2
as an example, a logical equivalent would be:
{code:sql}
FROM (
FROM Users U1, Friends F
WHERE U1.user_id = F.user_id
SELECT DISTINCT U1.*
) TMP
SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code}
> EXCEPT in SelectClause
> ----------------------
>
> Key: ASTERIXDB-3059
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-3059
> Project: Apache AsterixDB
> Issue Type: New Feature
> Components: SQL - Translator SQL++
> Reporter: Glenn Justo Galvizo
> Assignee: Glenn Justo Galvizo
> Priority: Major
> Attachments: Projection.png, SelectClause.png
>
>
> It would be nice to have an option to exclude certain fields when using
> SELECT *.
> [BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]
> gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2,
> ...'. 'EXCEPT' seems like a nice addition to our grammar, here are a few
> examples on how we can extend our SELECT clause to include 'EXCEPT':
>
> Let's start off with some DDLs and some data:
> {code:sql}
> CREATE TYPE GenericType AS { _id: uuid };
> CREATE DATASET Users (GenericType) PRIMARY KEY _id AUTOGENERATED;
> CREATE DATASET Friends (GenericType) PRIMARY KEY _id AUTOGENERATED;
> INSERT INTO Users [
> { "user_id": 1, "name": "Sally",
> "address": { "zip_code": "92512", "state": "CA" } },
> { "user_id": 2, "name": "Mark" },
> { "user_id": 3, "name": "John" },
> { "user_id": 4, "name": "Tracy", "title": "Dr" }
> ];
> INSERT INTO Friends [
> { "user_id": 1, "friend": 2 },
> { "user_id": 1, "friend": 3 },
> { "user_id": 2, "friend": 1 },
> { "user_id": 3, "friend": 1 }
> ];{code}
>
> ----
>
> Now we can get into the examples:
> 1. Get me all users U1 that have friends. Do not include the F dataset. In
> the example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In
> contrast to BigQuery, we do not specify columns / fields in our exclusion
> list here, instead we specify an exclusion of _variables_ that would have
> otherwise been returned with our 'SELECT' clause.
> {code:sql}
> FROM Users U1, Friends F
> WHERE U1.user_id = F.user_id
> SELECT * EXCEPT F;{code}
> Returns the following (Sally appears twice because she has two friends):
> {code:java}
> > { "U1": { "user_id": 1, "name": "Sally",
> "address": { "zip_code": "92512", "state": "CA" } } }
> > { "U1": { "user_id": 1, "name": "Sally",
> "address": { "zip_code": "92512", "state": "CA" } } }
> > { "U1": { "user_id": 2, "name": "Mark" } }
> > { "U1": { "user_id": 3, "name": "John" } }{code}
> 2. Get me all unique users U1 that have friends. Do not include that user's
> address and title. The example below more closely aligns with BigQuery, as we
> specify columns / fields that we want to exclude from U1 in the exclusion
> list 'address' and 'title'.
> {code:sql}
> FROM Users U1, Friends F
> WHERE U1.user_id = F.user_id
> SELECT DISTINCT U1.* EXCEPT address, title;{code}
> Returns the following:
> {code:java}
> > { "user_id": 1, "name": "Sally" }
> > { "user_id": 2, "name": "Mark" }
> > { "user_id": 3, "name": "John" }{code}
> 3. If we want to exclude some field in a nested object, we can specify the
> fields within our object with the '.' syntax. Say we don't want to include
> the zipcode of an address object within a user that has friends. We would use
> the query below:
> {code:sql}
> FROM Users U1, Friends F
> WHERE U1.user_id = F.user_id
> SELECT DISTINCT U1.* EXCEPT address.zip_code;{code}
> Returns the following:
> {code:java}
> > { "user_id": 1, "name": "Sally", "address": { "state": "CA" } }
> > { "user_id": 2, "name": "Mark" }
> > { "user_id": 3, "name": "John" }{code}
> 4. Now suppose we want users and the user documents of their immediate
> friends, excluding addresses from both user documents. We would use the
> following query:
> {code:sql}
> FROM Users U1, Friends F, Users U2
> WHERE U1.user_id = F.user_id AND
> U2.user_id = F.friend
> SELECT U1, U2 EXCEPT U1.address, U2.address;{code}
> Returns the following:
> {code:java}
> > { "U1": { "user_id": 1, "name": "Sally" },
> "U2": { "user_id": 2, "name": "Mark" } }
> > { "U1": { "user_id": 1, "name": "Sally" },
> "U2": { "user_id": 3, "name": "John" } }
> > { "U1": { "user_id": 2, "name": "Mark" },
> "U2": { "user_id": 1, "name": "Sally" } }
> > { "U1": { "user_id": 3, "name": "John" },
> "U2": { "user_id": 1, "name": "Sally" } }{code}
>
> ----
>
> Now let's see some "problematic" queries:
> 1. U3 is not defined anywhere in the query, but the EXCEPT sees U3 as a
> potential field from the SELECT clause. For now we are going to execute our
> query without warnings, but it would be nice to raise a warning here in the
> future (otherwise, we would have to implement some form of type analysis into
> the lang module).
> {code:sql}
> FROM Users U1, Friends F
> WHERE U1.user_id = F.user_id
> SELECT U1 EXCEPT U3;{code}
> 2. Similar to 1, our EXCEPT clause is extraneous. F is not visible (using
> EXCEPT sees what the SELECT returns semantics). We are going to execute our
> query without warnings, but it would be nice to raise a warning here in the
> future:
> {code:sql}
> FROM Users U1, Friends F
> WHERE U1.user_id = F.user_id
> SELECT U1 EXCEPT F;{code}
> 3. Looking at our data, we do not have any documents that have a ssn field.
> But, we (currently) cannot infer the absence of an ssn field, so the query
> below should run without warnings / errors.
> {code:sql}
> FROM Users U1, Friends F
> WHERE U1.user_id = F.user_id
> SELECT U1 EXCEPT U1.ssn;{code}
> 4. (Raised by Don Chamberlin): In all of the previous examples, we could
> connect each field / nested field in the EXCEPT list using the first
> identifier in the field / nested field itself to some identifier in the
> SELECT list. With SELECT VALUE, we have no such identifier. Hence, we do not
> allow SELECT VALUE and EXCEPT to exist together to make it clearer what each
> field in our EXCEPT list refers to.
> {code:sql}
> FROM Users U1
> SELECT VALUE U1.address EXCEPT zip_code;{code}
>
> ----
> The exact grammar I have in mind is given below, where we simply add the
> 'EXCEPT' at the tail of our 'SELECT' clause. Semantically, the 'EXCEPT'
> clause surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS.
> Using #2 as an example, a logical equivalent would be:
> {code:sql}
> FROM (
> FROM Users U1, Friends F
> WHERE U1.user_id = F.user_id
> SELECT DISTINCT U1.*
> ) TMP
> SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)