vtlim commented on code in PR #13736:
URL: https://github.com/apache/druid/pull/13736#discussion_r1152555855
##########
docs/querying/sql.md:
##########
@@ -82,6 +83,43 @@ FROM clause, metadata tables are not considered datasources.
They exist only in
For more information about table, lookup, query, and join datasources, refer
to the [Datasources](datasource.md)
documentation.
+## UNNEST
+
+> The UNNEST SQL function is [experimental](../development/experimental.md).
Their API and behavior are subject
+> to change in future releases. It is not recommended to use this feature in
production at this time.
+
+The UNNEST clause unnests array values. It's the SQL equivalent to the [unnest
datasource](./datasource.md#unnest). The source for UNNEST can be an array or
an input that's been transformed into an array, such as with helper functions
like MV_TO_ARRAY or ARRAY.
+
+The following is the general syntax for UNNEST, specifically a query that
returns the column that gets unnested:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression1) AS
table_alias_name1(column_alias_name1), UNNEST(source_expression2) AS
table_alias_name2(column_alias_name2), ...
+```
+
+* The `datasource` for UNNEST can be any Druid datasource, such as the
following:
+ * A table, such as `FROM a_table`.
+ * A subset of a table based on a query, a filter, or a JOIN. For example,
`FROM (SELECT columnA,columnB,columnC from a_table)`.
+* The `source_expression` for the UNNEST function must be an array and can
come from any expression. If the dimension you are unnesting is a multi-value
dimension, you have to specify `MV_TO_ARRAY(dimension)` to convert it to an
implicit ARRAY type. You can also specify any expression that has an SQL array
datatype. For example, you can call UNNEST on the following:
+ * `ARRAY[dim1,dim2]` if you want to make an array out of two dimensions.
+ * `ARRAY_CONCAT(dim1,dim2)` if you have to concatenate two multi-value
dimensions.
Review Comment:
```suggestion
* `ARRAY_CONCAT(dim1,dim2)` if you want to concatenate two multi-value
dimensions.
```
##########
docs/querying/sql.md:
##########
@@ -82,6 +83,43 @@ FROM clause, metadata tables are not considered datasources.
They exist only in
For more information about table, lookup, query, and join datasources, refer
to the [Datasources](datasource.md)
documentation.
+## UNNEST
+
+> The UNNEST SQL function is [experimental](../development/experimental.md).
Their API and behavior are subject
Review Comment:
```suggestion
> The UNNEST SQL function is [experimental](../development/experimental.md).
Its API and behavior are subject
```
##########
docs/querying/sql.md:
##########
@@ -82,6 +83,43 @@ FROM clause, metadata tables are not considered datasources.
They exist only in
For more information about table, lookup, query, and join datasources, refer
to the [Datasources](datasource.md)
documentation.
+## UNNEST
+
+> The UNNEST SQL function is [experimental](../development/experimental.md).
Their API and behavior are subject
+> to change in future releases. It is not recommended to use this feature in
production at this time.
+
+The UNNEST clause unnests array values. It's the SQL equivalent to the [unnest
datasource](./datasource.md#unnest). The source for UNNEST can be an array or
an input that's been transformed into an array, such as with helper functions
like MV_TO_ARRAY or ARRAY.
+
+The following is the general syntax for UNNEST, specifically a query that
returns the column that gets unnested:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression1) AS
table_alias_name1(column_alias_name1), UNNEST(source_expression2) AS
table_alias_name2(column_alias_name2), ...
+```
+
+* The `datasource` for UNNEST can be any Druid datasource, such as the
following:
+ * A table, such as `FROM a_table`.
+ * A subset of a table based on a query, a filter, or a JOIN. For example,
`FROM (SELECT columnA,columnB,columnC from a_table)`.
+* The `source_expression` for the UNNEST function must be an array and can
come from any expression. If the dimension you are unnesting is a multi-value
dimension, you have to specify `MV_TO_ARRAY(dimension)` to convert it to an
implicit ARRAY type. You can also specify any expression that has an SQL array
datatype. For example, you can call UNNEST on the following:
+ * `ARRAY[dim1,dim2]` if you want to make an array out of two dimensions.
+ * `ARRAY_CONCAT(dim1,dim2)` if you have to concatenate two multi-value
dimensions.
+* The `AS table_alias_name(column_alias_name)` clause is not required but is
highly recommended. Use it to specify the output, which can be an existing
column or a new one. Replace `table_alias_name` and `column_alias_name` with a
table and column name you want to alias the unnested results to. If you don't
provide this, Druid uses a nondescriptive name, such as `EXPR$0`.
+
+Keep these two things in mind when writing your query:
+
+- You must include the context parameter `"enableUnnest": true`.
+- You can unnest multiple source expressions in a single query.
+- Notice the comma between the datasource and the UNNEST function. This is
needed in most cases of the UNNEST function. Specifically, it is not needed
when you're unnesting an inline array since the array itself is the datasource.
+- If you view the native explanation of a SQL UNNEST, you'll notice that Druid
uses `j0.unnest` as a virtual column to perform the unnest. An underscore is
added for each unnest, so you may notice virtual columns named `_j0.unnest` or
`__j0.unnest`.
+
+For examples, see the [Unnest arrays
tutorial](../tutorials/tutorial-unnest-arrays.md).
+
+The UNNEST function has the following limitations:
+
+- The function does not remove any duplicates or nulls in an array. Nulls will
be treated as any other value in an array. If there are multiple nulls within
the array, a record corresponding to each of the nulls gets created.
+- Arrays inside complex JSON types are not supported.
+- You cannot perform an UNNEST at ingestion time, including SQL-based
ingestion using the MSQ task engine.
+- UNNEST preserves the ordering in the source array that is being unnested.
Review Comment:
Is this a limitation or rather just a description of how UNNEST functions?
##########
docs/tutorials/tutorial-unnest-arrays.md:
##########
@@ -141,12 +145,128 @@ Now that the data is loaded, run the following query:
SELECT * FROM nested_data
```
-In the results, notice that the column named `dim3` has nested values like
`["a","b"]`. The example queries that follow unnest `dim3` and run queries,
such as Scan.
+In the results, notice that the column named `dim3` has nested values like
`["a","b"]`. The example queries that follow unnest `dim3` and run queries
against the unnested records. Depending on the type of queries you write, see
either [Unnest using SQL queries](#unnest-using-sql-queries) or [Unnest using
native queries](#unnest-using-native-queries).
-## Unnest a single column
+## Unnest using SQL queries
+
+The following is the general syntax for UNNEST:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression) AS
table_alias_name(column_alias_name)
+```
+
+In addition, you must supply the following context parameter:
+
+```json
+"enableUnnest": "true"
+```
+
+For more information about the syntax, see [UNNEST](../querying/sql.md#unnest).
+
+### Unnest a single source expression in a datasource
+
+The following query returns a column called `d3` from the table `nested_data`.
`d3` contains the unnested values from the source column `dim3`:
+
+```sql
+SELECT d3 FROM "nested_data", UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
+```
+
+Notice the MV_TO_ARRAY helper function, which converts the multi-value records
in `dim3` to arrays. It is required since `dim3` is a multi-value string
dimension.
+
+If the column you are unnesting is not a string dimension, then you do not
need to use the MV_TO_ARRAY helper function.
+
+### Unnest a virtual column
+
+You can unnest into a virtual column (multiple columns treated as one). The
following query returns the two source columns and a third virtual column
containing the unnested data:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY[dim4,dim5]) AS
example_table(d45)
+```
+
+This virtual column is the product of the two source columns. Notice how the
total number of rows has grown. The table `nested_data` had only seven rows
originally.
+
+Another way to unnest a virtual column is to concatenate them with
ARRAY_CONCAT:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY_CONCAT(dim4,dim5)) AS
example_table(d45)
+```
+
+Decide which method to use based on what your goals are.
+
+### Unnest multiple source expressions
+
+You can include multiple UNNEST clauses in a single query. Each `UNNEST`
clause needs the following:
+
+```sql
+UNNEST(source_expression) AS table_alias_name(column_alias_name)
+```
+
+The `table_alias_name` and `column_alias_name` for each UNNEST clause should
be unique.
+
+The example query returns the following from the `nested_data` datasource:
+
+- the source columns `dim3`, `dim4`, and `dim5`
+- an unnested version of `dim3` aliased to `d3`
+- an unnested virtual column composed of `dim4` and `dim5` aliased to `d45`
+
+```sql
+SELECT dim3,dim4,dim5,d3,d45 FROM "nested_data", UNNEST(MV_TO_ARRAY("dim3"))
AS ud(d3), UNNEST(ARRAY[dim4,dim5]) AS foo(d45)
Review Comment:
```suggestion
SELECT dim3,dim4,dim5,d3,d45 FROM "nested_data", UNNEST(MV_TO_ARRAY("dim3"))
AS foo1(d3), UNNEST(ARRAY[dim4,dim5]) AS foo2(d45)
```
##########
docs/tutorials/tutorial-unnest-arrays.md:
##########
@@ -141,12 +145,128 @@ Now that the data is loaded, run the following query:
SELECT * FROM nested_data
```
-In the results, notice that the column named `dim3` has nested values like
`["a","b"]`. The example queries that follow unnest `dim3` and run queries,
such as Scan.
+In the results, notice that the column named `dim3` has nested values like
`["a","b"]`. The example queries that follow unnest `dim3` and run queries
against the unnested records. Depending on the type of queries you write, see
either [Unnest using SQL queries](#unnest-using-sql-queries) or [Unnest using
native queries](#unnest-using-native-queries).
-## Unnest a single column
+## Unnest using SQL queries
+
+The following is the general syntax for UNNEST:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression) AS
table_alias_name(column_alias_name)
+```
+
+In addition, you must supply the following context parameter:
+
+```json
+"enableUnnest": "true"
+```
+
+For more information about the syntax, see [UNNEST](../querying/sql.md#unnest).
+
+### Unnest a single source expression in a datasource
+
+The following query returns a column called `d3` from the table `nested_data`.
`d3` contains the unnested values from the source column `dim3`:
+
+```sql
+SELECT d3 FROM "nested_data", UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
+```
+
+Notice the MV_TO_ARRAY helper function, which converts the multi-value records
in `dim3` to arrays. It is required since `dim3` is a multi-value string
dimension.
+
+If the column you are unnesting is not a string dimension, then you do not
need to use the MV_TO_ARRAY helper function.
+
+### Unnest a virtual column
+
+You can unnest into a virtual column (multiple columns treated as one). The
following query returns the two source columns and a third virtual column
containing the unnested data:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY[dim4,dim5]) AS
example_table(d45)
+```
+
+This virtual column is the product of the two source columns. Notice how the
total number of rows has grown. The table `nested_data` had only seven rows
originally.
+
+Another way to unnest a virtual column is to concatenate them with
ARRAY_CONCAT:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY_CONCAT(dim4,dim5)) AS
example_table(d45)
+```
+
+Decide which method to use based on what your goals are.
+
+### Unnest multiple source expressions
+
+You can include multiple UNNEST clauses in a single query. Each `UNNEST`
clause needs the following:
+
+```sql
+UNNEST(source_expression) AS table_alias_name(column_alias_name)
+```
+
+The `table_alias_name` and `column_alias_name` for each UNNEST clause should
be unique.
+
+The example query returns the following from the `nested_data` datasource:
+
+- the source columns `dim3`, `dim4`, and `dim5`
+- an unnested version of `dim3` aliased to `d3`
+- an unnested virtual column composed of `dim4` and `dim5` aliased to `d45`
+
+```sql
+SELECT dim3,dim4,dim5,d3,d45 FROM "nested_data", UNNEST(MV_TO_ARRAY("dim3"))
AS ud(d3), UNNEST(ARRAY[dim4,dim5]) AS foo(d45)
+```
+
+
+### Unnest a column from a subset of a table
+
+The following query uses only three columns from the `nested_data` table as
the datasource. From that subset, it unnests the column `dim3` into `d3` and
returns `d3`.
+
+```sql
+SELECT d3 FROM (select dim1, dim2, dim3 from "nested_data"),
UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
Review Comment:
```suggestion
SELECT d3 FROM (SELECT dim1, dim2, dim3 FROM "nested_data"),
UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
```
##########
docs/tutorials/tutorial-unnest-arrays.md:
##########
@@ -141,12 +145,128 @@ Now that the data is loaded, run the following query:
SELECT * FROM nested_data
```
-In the results, notice that the column named `dim3` has nested values like
`["a","b"]`. The example queries that follow unnest `dim3` and run queries,
such as Scan.
+In the results, notice that the column named `dim3` has nested values like
`["a","b"]`. The example queries that follow unnest `dim3` and run queries
against the unnested records. Depending on the type of queries you write, see
either [Unnest using SQL queries](#unnest-using-sql-queries) or [Unnest using
native queries](#unnest-using-native-queries).
-## Unnest a single column
+## Unnest using SQL queries
+
+The following is the general syntax for UNNEST:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression) AS
table_alias_name(column_alias_name)
+```
+
+In addition, you must supply the following context parameter:
+
+```json
+"enableUnnest": "true"
+```
+
+For more information about the syntax, see [UNNEST](../querying/sql.md#unnest).
+
+### Unnest a single source expression in a datasource
+
+The following query returns a column called `d3` from the table `nested_data`.
`d3` contains the unnested values from the source column `dim3`:
+
+```sql
+SELECT d3 FROM "nested_data", UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
+```
+
+Notice the MV_TO_ARRAY helper function, which converts the multi-value records
in `dim3` to arrays. It is required since `dim3` is a multi-value string
dimension.
+
+If the column you are unnesting is not a string dimension, then you do not
need to use the MV_TO_ARRAY helper function.
+
+### Unnest a virtual column
+
+You can unnest into a virtual column (multiple columns treated as one). The
following query returns the two source columns and a third virtual column
containing the unnested data:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY[dim4,dim5]) AS
example_table(d45)
+```
+
+This virtual column is the product of the two source columns. Notice how the
total number of rows has grown. The table `nested_data` had only seven rows
originally.
+
+Another way to unnest a virtual column is to concatenate them with
ARRAY_CONCAT:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY_CONCAT(dim4,dim5)) AS
example_table(d45)
+```
+
+Decide which method to use based on what your goals are.
+
+### Unnest multiple source expressions
+
+You can include multiple UNNEST clauses in a single query. Each `UNNEST`
clause needs the following:
+
+```sql
+UNNEST(source_expression) AS table_alias_name(column_alias_name)
+```
+
+The `table_alias_name` and `column_alias_name` for each UNNEST clause should
be unique.
+
+The example query returns the following from the `nested_data` datasource:
+
+- the source columns `dim3`, `dim4`, and `dim5`
+- an unnested version of `dim3` aliased to `d3`
+- an unnested virtual column composed of `dim4` and `dim5` aliased to `d45`
+
+```sql
+SELECT dim3,dim4,dim5,d3,d45 FROM "nested_data", UNNEST(MV_TO_ARRAY("dim3"))
AS ud(d3), UNNEST(ARRAY[dim4,dim5]) AS foo(d45)
+```
+
+
+### Unnest a column from a subset of a table
+
+The following query uses only three columns from the `nested_data` table as
the datasource. From that subset, it unnests the column `dim3` into `d3` and
returns `d3`.
+
+```sql
+SELECT d3 FROM (select dim1, dim2, dim3 from "nested_data"),
UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
+```
+
+### Unnest with a filter
+
+You can specify which rows to unnest by including a filter in your query. The
following query:
+
+* Filters the source expression based on `dim2`
+* Unnests the records in `dim3` into `d3`
+* Returns the records for the unnested `d3` that have a `dim2` record that
matches the filter
+
+```sql
+SELECT d3 FROM (SELECT * FROM nested_data WHERE dim2 IN ('abc')),
UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
+```
+
+You can also filter the results of an UNNEST clause. The following example
unnests the inline array `[1,2,3]` but only returns the rows that match the
filter:
+
+```sql
+SELECT * FROM UNNEST(ARRAY[1,2,3]) AS ud(d1) WHERE d1 IN ('1','2')
Review Comment:
```suggestion
SELECT * FROM UNNEST(ARRAY[1,2,3]) AS example_table(d1) WHERE d1 IN ('1','2')
```
Matches the example above
##########
docs/querying/sql.md:
##########
@@ -82,6 +83,43 @@ FROM clause, metadata tables are not considered datasources.
They exist only in
For more information about table, lookup, query, and join datasources, refer
to the [Datasources](datasource.md)
documentation.
+## UNNEST
+
+> The UNNEST SQL function is [experimental](../development/experimental.md).
Their API and behavior are subject
+> to change in future releases. It is not recommended to use this feature in
production at this time.
+
+The UNNEST clause unnests array values. It's the SQL equivalent to the [unnest
datasource](./datasource.md#unnest). The source for UNNEST can be an array or
an input that's been transformed into an array, such as with helper functions
like MV_TO_ARRAY or ARRAY.
+
+The following is the general syntax for UNNEST, specifically a query that
returns the column that gets unnested:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression1) AS
table_alias_name1(column_alias_name1), UNNEST(source_expression2) AS
table_alias_name2(column_alias_name2), ...
+```
+
+* The `datasource` for UNNEST can be any Druid datasource, such as the
following:
+ * A table, such as `FROM a_table`.
+ * A subset of a table based on a query, a filter, or a JOIN. For example,
`FROM (SELECT columnA,columnB,columnC from a_table)`.
+* The `source_expression` for the UNNEST function must be an array and can
come from any expression. If the dimension you are unnesting is a multi-value
dimension, you have to specify `MV_TO_ARRAY(dimension)` to convert it to an
implicit ARRAY type. You can also specify any expression that has an SQL array
datatype. For example, you can call UNNEST on the following:
+ * `ARRAY[dim1,dim2]` if you want to make an array out of two dimensions.
+ * `ARRAY_CONCAT(dim1,dim2)` if you have to concatenate two multi-value
dimensions.
+* The `AS table_alias_name(column_alias_name)` clause is not required but is
highly recommended. Use it to specify the output, which can be an existing
column or a new one. Replace `table_alias_name` and `column_alias_name` with a
table and column name you want to alias the unnested results to. If you don't
provide this, Druid uses a nondescriptive name, such as `EXPR$0`.
+
+Keep these two things in mind when writing your query:
Review Comment:
```suggestion
Keep the following things in mind when writing your query:
```
##########
docs/tutorials/tutorial-unnest-arrays.md:
##########
@@ -141,12 +145,128 @@ Now that the data is loaded, run the following query:
SELECT * FROM nested_data
```
-In the results, notice that the column named `dim3` has nested values like
`["a","b"]`. The example queries that follow unnest `dim3` and run queries,
such as Scan.
+In the results, notice that the column named `dim3` has nested values like
`["a","b"]`. The example queries that follow unnest `dim3` and run queries
against the unnested records. Depending on the type of queries you write, see
either [Unnest using SQL queries](#unnest-using-sql-queries) or [Unnest using
native queries](#unnest-using-native-queries).
-## Unnest a single column
+## Unnest using SQL queries
+
+The following is the general syntax for UNNEST:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression) AS
table_alias_name(column_alias_name)
+```
+
+In addition, you must supply the following context parameter:
+
+```json
+"enableUnnest": "true"
+```
+
+For more information about the syntax, see [UNNEST](../querying/sql.md#unnest).
+
+### Unnest a single source expression in a datasource
+
+The following query returns a column called `d3` from the table `nested_data`.
`d3` contains the unnested values from the source column `dim3`:
+
+```sql
+SELECT d3 FROM "nested_data", UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
+```
+
+Notice the MV_TO_ARRAY helper function, which converts the multi-value records
in `dim3` to arrays. It is required since `dim3` is a multi-value string
dimension.
+
+If the column you are unnesting is not a string dimension, then you do not
need to use the MV_TO_ARRAY helper function.
+
+### Unnest a virtual column
+
+You can unnest into a virtual column (multiple columns treated as one). The
following query returns the two source columns and a third virtual column
containing the unnested data:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY[dim4,dim5]) AS
example_table(d45)
+```
+
+This virtual column is the product of the two source columns. Notice how the
total number of rows has grown. The table `nested_data` had only seven rows
originally.
Review Comment:
```suggestion
The virtual column `d45` is the product of the two source columns. Notice
how the total number of rows has grown. The table `nested_data` had only seven
rows originally.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]