vtlim commented on code in PR #13736:
URL: https://github.com/apache/druid/pull/13736#discussion_r1152555855


##########
docs/querying/sql.md:
##########
@@ -82,6 +83,43 @@ FROM clause, metadata tables are not considered datasources. 
They exist only in
 For more information about table, lookup, query, and join datasources, refer 
to the [Datasources](datasource.md)
 documentation.
 
+## UNNEST
+
+> The UNNEST SQL function is [experimental](../development/experimental.md). 
Their API and behavior are subject
+> to change in future releases. It is not recommended to use this feature in 
production at this time.
+
+The UNNEST clause unnests array values. It's the SQL equivalent to the [unnest 
datasource](./datasource.md#unnest). The source for UNNEST can be an array or 
an input that's been transformed into an array, such as with helper functions 
like MV_TO_ARRAY or ARRAY.
+
+The following is the general syntax for UNNEST, specifically a query that 
returns the column that gets unnested:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression1) AS 
table_alias_name1(column_alias_name1), UNNEST(source_expression2) AS 
table_alias_name2(column_alias_name2), ...
+```
+
+* The `datasource` for UNNEST can be any Druid datasource, such as the 
following:
+  * A table, such as  `FROM a_table`.
+  * A subset of a table based on a query, a filter, or a JOIN. For example, 
`FROM (SELECT columnA,columnB,columnC from a_table)`.
+* The `source_expression` for the UNNEST function must be an array and can 
come from any expression. If the dimension you are unnesting is a multi-value 
dimension, you have to specify `MV_TO_ARRAY(dimension)` to convert it to an 
implicit ARRAY type. You can also specify any expression that has an SQL array 
datatype. For example, you can call UNNEST on the following:
+  * `ARRAY[dim1,dim2]` if you want to make an array out of two dimensions. 
+  * `ARRAY_CONCAT(dim1,dim2)` if you have to concatenate two multi-value 
dimensions. 

Review Comment:
   ```suggestion
     * `ARRAY_CONCAT(dim1,dim2)` if you want to concatenate two multi-value 
dimensions. 
   ```



##########
docs/querying/sql.md:
##########
@@ -82,6 +83,43 @@ FROM clause, metadata tables are not considered datasources. 
They exist only in
 For more information about table, lookup, query, and join datasources, refer 
to the [Datasources](datasource.md)
 documentation.
 
+## UNNEST
+
+> The UNNEST SQL function is [experimental](../development/experimental.md). 
Their API and behavior are subject

Review Comment:
   ```suggestion
   > The UNNEST SQL function is [experimental](../development/experimental.md). 
Its API and behavior are subject
   ```



##########
docs/querying/sql.md:
##########
@@ -82,6 +83,43 @@ FROM clause, metadata tables are not considered datasources. 
They exist only in
 For more information about table, lookup, query, and join datasources, refer 
to the [Datasources](datasource.md)
 documentation.
 
+## UNNEST
+
+> The UNNEST SQL function is [experimental](../development/experimental.md). 
Their API and behavior are subject
+> to change in future releases. It is not recommended to use this feature in 
production at this time.
+
+The UNNEST clause unnests array values. It's the SQL equivalent to the [unnest 
datasource](./datasource.md#unnest). The source for UNNEST can be an array or 
an input that's been transformed into an array, such as with helper functions 
like MV_TO_ARRAY or ARRAY.
+
+The following is the general syntax for UNNEST, specifically a query that 
returns the column that gets unnested:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression1) AS 
table_alias_name1(column_alias_name1), UNNEST(source_expression2) AS 
table_alias_name2(column_alias_name2), ...
+```
+
+* The `datasource` for UNNEST can be any Druid datasource, such as the 
following:
+  * A table, such as  `FROM a_table`.
+  * A subset of a table based on a query, a filter, or a JOIN. For example, 
`FROM (SELECT columnA,columnB,columnC from a_table)`.
+* The `source_expression` for the UNNEST function must be an array and can 
come from any expression. If the dimension you are unnesting is a multi-value 
dimension, you have to specify `MV_TO_ARRAY(dimension)` to convert it to an 
implicit ARRAY type. You can also specify any expression that has an SQL array 
datatype. For example, you can call UNNEST on the following:
+  * `ARRAY[dim1,dim2]` if you want to make an array out of two dimensions. 
+  * `ARRAY_CONCAT(dim1,dim2)` if you have to concatenate two multi-value 
dimensions. 
+* The `AS table_alias_name(column_alias_name)` clause  is not required but is 
highly recommended. Use it to specify the output, which can be an existing 
column or a new one. Replace `table_alias_name` and `column_alias_name` with a 
table and column name you want to alias the unnested results to. If you don't 
provide this, Druid uses a nondescriptive name, such as `EXPR$0`.
+
+Keep these two things in mind when writing your query:
+
+- You must include the context parameter `"enableUnnest": true`.
+- You can unnest multiple source expressions in a single query.
+- Notice the comma between the datasource and the UNNEST function. This is 
needed in most cases of the UNNEST function. Specifically, it is not needed 
when you're unnesting an inline array since the array itself is the datasource.
+- If you view the native explanation of a SQL UNNEST, you'll notice that Druid 
uses `j0.unnest` as a virtual column to perform the unnest. An underscore is 
added for each unnest, so you may notice virtual columns named `_j0.unnest` or 
`__j0.unnest`.
+
+For examples, see the [Unnest arrays 
tutorial](../tutorials/tutorial-unnest-arrays.md).
+
+The UNNEST function has the following limitations:
+
+- The function does not remove any duplicates or nulls in an array. Nulls will 
be treated as any other value in an array. If there are multiple nulls within 
the array, a record corresponding to each of the nulls gets created.
+- Arrays inside complex JSON types are not supported.
+- You cannot perform an UNNEST at ingestion time, including SQL-based 
ingestion using the MSQ task engine.
+- UNNEST preserves the ordering in the source array that is being unnested.

Review Comment:
   Is this a limitation or rather just a description of how UNNEST functions?



##########
docs/tutorials/tutorial-unnest-arrays.md:
##########
@@ -141,12 +145,128 @@ Now that the data is loaded, run the following query:
 SELECT * FROM nested_data
 ```
 
-In the results, notice that the column named `dim3` has nested values like 
`["a","b"]`.  The example queries that follow unnest `dim3`  and run queries, 
such as Scan. 
+In the results, notice that the column named `dim3` has nested values like 
`["a","b"]`.  The example queries that follow unnest `dim3`  and run queries 
against the unnested records. Depending on the type of queries you write, see 
either [Unnest using SQL queries](#unnest-using-sql-queries) or [Unnest using 
native queries](#unnest-using-native-queries).
 
-## Unnest a single column
+## Unnest using SQL queries
+
+The following is the general syntax for UNNEST:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression) AS 
table_alias_name(column_alias_name)
+```
+
+In addition, you must supply the following context parameter:
+
+```json
+"enableUnnest": "true"
+```
+
+For more information about the syntax, see [UNNEST](../querying/sql.md#unnest).
+
+### Unnest a single source expression in a datasource
+
+The following query returns a column called `d3` from the table `nested_data`. 
`d3` contains the unnested values from the source column `dim3`:
+
+```sql
+SELECT d3 FROM "nested_data", UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3) 
+```
+
+Notice the MV_TO_ARRAY helper function, which converts the multi-value records 
in `dim3` to arrays. It is required since `dim3` is a multi-value string 
dimension. 
+
+If the column you are unnesting is not a string dimension, then you do not 
need to use the MV_TO_ARRAY helper function.
+
+### Unnest a virtual column
+
+You can unnest into a virtual column (multiple columns treated as one). The 
following query returns the two source columns and a third virtual column 
containing the unnested data:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY[dim4,dim5]) AS 
example_table(d45)
+```
+
+This virtual column is the product of the two source columns. Notice how the 
total number of rows has grown. The table `nested_data` had only seven rows 
originally.
+
+Another way to unnest a virtual column is to concatenate them with 
ARRAY_CONCAT:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY_CONCAT(dim4,dim5)) AS 
example_table(d45)
+```
+
+Decide which method to use based on what your goals are. 
+
+### Unnest multiple source expressions
+
+You can include multiple UNNEST clauses in a single query. Each `UNNEST` 
clause needs the following:
+
+```sql
+UNNEST(source_expression) AS table_alias_name(column_alias_name)
+```
+
+The `table_alias_name` and `column_alias_name` for each UNNEST clause should 
be unique.
+
+The example query returns the following from  the `nested_data` datasource:
+
+- the source columns `dim3`, `dim4`, and `dim5`
+- an unnested version of `dim3` aliased to `d3`
+- an unnested virtual column composed of `dim4` and `dim5` aliased to `d45`
+
+```sql
+SELECT dim3,dim4,dim5,d3,d45 FROM "nested_data", UNNEST(MV_TO_ARRAY("dim3")) 
AS ud(d3), UNNEST(ARRAY[dim4,dim5]) AS foo(d45)

Review Comment:
   ```suggestion
   SELECT dim3,dim4,dim5,d3,d45 FROM "nested_data", UNNEST(MV_TO_ARRAY("dim3")) 
AS foo1(d3), UNNEST(ARRAY[dim4,dim5]) AS foo2(d45)
   ```



##########
docs/tutorials/tutorial-unnest-arrays.md:
##########
@@ -141,12 +145,128 @@ Now that the data is loaded, run the following query:
 SELECT * FROM nested_data
 ```
 
-In the results, notice that the column named `dim3` has nested values like 
`["a","b"]`.  The example queries that follow unnest `dim3`  and run queries, 
such as Scan. 
+In the results, notice that the column named `dim3` has nested values like 
`["a","b"]`.  The example queries that follow unnest `dim3`  and run queries 
against the unnested records. Depending on the type of queries you write, see 
either [Unnest using SQL queries](#unnest-using-sql-queries) or [Unnest using 
native queries](#unnest-using-native-queries).
 
-## Unnest a single column
+## Unnest using SQL queries
+
+The following is the general syntax for UNNEST:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression) AS 
table_alias_name(column_alias_name)
+```
+
+In addition, you must supply the following context parameter:
+
+```json
+"enableUnnest": "true"
+```
+
+For more information about the syntax, see [UNNEST](../querying/sql.md#unnest).
+
+### Unnest a single source expression in a datasource
+
+The following query returns a column called `d3` from the table `nested_data`. 
`d3` contains the unnested values from the source column `dim3`:
+
+```sql
+SELECT d3 FROM "nested_data", UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3) 
+```
+
+Notice the MV_TO_ARRAY helper function, which converts the multi-value records 
in `dim3` to arrays. It is required since `dim3` is a multi-value string 
dimension. 
+
+If the column you are unnesting is not a string dimension, then you do not 
need to use the MV_TO_ARRAY helper function.
+
+### Unnest a virtual column
+
+You can unnest into a virtual column (multiple columns treated as one). The 
following query returns the two source columns and a third virtual column 
containing the unnested data:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY[dim4,dim5]) AS 
example_table(d45)
+```
+
+This virtual column is the product of the two source columns. Notice how the 
total number of rows has grown. The table `nested_data` had only seven rows 
originally.
+
+Another way to unnest a virtual column is to concatenate them with 
ARRAY_CONCAT:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY_CONCAT(dim4,dim5)) AS 
example_table(d45)
+```
+
+Decide which method to use based on what your goals are. 
+
+### Unnest multiple source expressions
+
+You can include multiple UNNEST clauses in a single query. Each `UNNEST` 
clause needs the following:
+
+```sql
+UNNEST(source_expression) AS table_alias_name(column_alias_name)
+```
+
+The `table_alias_name` and `column_alias_name` for each UNNEST clause should 
be unique.
+
+The example query returns the following from  the `nested_data` datasource:
+
+- the source columns `dim3`, `dim4`, and `dim5`
+- an unnested version of `dim3` aliased to `d3`
+- an unnested virtual column composed of `dim4` and `dim5` aliased to `d45`
+
+```sql
+SELECT dim3,dim4,dim5,d3,d45 FROM "nested_data", UNNEST(MV_TO_ARRAY("dim3")) 
AS ud(d3), UNNEST(ARRAY[dim4,dim5]) AS foo(d45)
+```
+
+
+### Unnest a column from a subset of a table
+
+The following query uses only three columns from the `nested_data` table as 
the datasource. From that subset, it unnests the column `dim3` into `d3` and 
returns `d3`.
+
+```sql
+SELECT d3 FROM (select dim1, dim2, dim3 from "nested_data"), 
UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)

Review Comment:
   ```suggestion
   SELECT d3 FROM (SELECT dim1, dim2, dim3 FROM "nested_data"), 
UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
   ```



##########
docs/tutorials/tutorial-unnest-arrays.md:
##########
@@ -141,12 +145,128 @@ Now that the data is loaded, run the following query:
 SELECT * FROM nested_data
 ```
 
-In the results, notice that the column named `dim3` has nested values like 
`["a","b"]`.  The example queries that follow unnest `dim3`  and run queries, 
such as Scan. 
+In the results, notice that the column named `dim3` has nested values like 
`["a","b"]`.  The example queries that follow unnest `dim3`  and run queries 
against the unnested records. Depending on the type of queries you write, see 
either [Unnest using SQL queries](#unnest-using-sql-queries) or [Unnest using 
native queries](#unnest-using-native-queries).
 
-## Unnest a single column
+## Unnest using SQL queries
+
+The following is the general syntax for UNNEST:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression) AS 
table_alias_name(column_alias_name)
+```
+
+In addition, you must supply the following context parameter:
+
+```json
+"enableUnnest": "true"
+```
+
+For more information about the syntax, see [UNNEST](../querying/sql.md#unnest).
+
+### Unnest a single source expression in a datasource
+
+The following query returns a column called `d3` from the table `nested_data`. 
`d3` contains the unnested values from the source column `dim3`:
+
+```sql
+SELECT d3 FROM "nested_data", UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3) 
+```
+
+Notice the MV_TO_ARRAY helper function, which converts the multi-value records 
in `dim3` to arrays. It is required since `dim3` is a multi-value string 
dimension. 
+
+If the column you are unnesting is not a string dimension, then you do not 
need to use the MV_TO_ARRAY helper function.
+
+### Unnest a virtual column
+
+You can unnest into a virtual column (multiple columns treated as one). The 
following query returns the two source columns and a third virtual column 
containing the unnested data:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY[dim4,dim5]) AS 
example_table(d45)
+```
+
+This virtual column is the product of the two source columns. Notice how the 
total number of rows has grown. The table `nested_data` had only seven rows 
originally.
+
+Another way to unnest a virtual column is to concatenate them with 
ARRAY_CONCAT:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY_CONCAT(dim4,dim5)) AS 
example_table(d45)
+```
+
+Decide which method to use based on what your goals are. 
+
+### Unnest multiple source expressions
+
+You can include multiple UNNEST clauses in a single query. Each `UNNEST` 
clause needs the following:
+
+```sql
+UNNEST(source_expression) AS table_alias_name(column_alias_name)
+```
+
+The `table_alias_name` and `column_alias_name` for each UNNEST clause should 
be unique.
+
+The example query returns the following from  the `nested_data` datasource:
+
+- the source columns `dim3`, `dim4`, and `dim5`
+- an unnested version of `dim3` aliased to `d3`
+- an unnested virtual column composed of `dim4` and `dim5` aliased to `d45`
+
+```sql
+SELECT dim3,dim4,dim5,d3,d45 FROM "nested_data", UNNEST(MV_TO_ARRAY("dim3")) 
AS ud(d3), UNNEST(ARRAY[dim4,dim5]) AS foo(d45)
+```
+
+
+### Unnest a column from a subset of a table
+
+The following query uses only three columns from the `nested_data` table as 
the datasource. From that subset, it unnests the column `dim3` into `d3` and 
returns `d3`.
+
+```sql
+SELECT d3 FROM (select dim1, dim2, dim3 from "nested_data"), 
UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
+```
+
+### Unnest with a filter
+
+You can specify which rows to unnest by including a filter in your query. The 
following query:
+
+* Filters the source expression based on `dim2`
+* Unnests the records in `dim3` into `d3` 
+* Returns the records for  the unnested `d3` that have a `dim2` record that 
matches the filter
+
+```sql
+SELECT d3 FROM (SELECT * FROM nested_data WHERE dim2 IN ('abc')), 
UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
+```
+
+You can also filter the results of an UNNEST clause. The following example 
unnests the inline array `[1,2,3]` but only returns the rows that match the 
filter:
+
+```sql
+SELECT * FROM UNNEST(ARRAY[1,2,3]) AS ud(d1) WHERE d1 IN ('1','2')

Review Comment:
   ```suggestion
   SELECT * FROM UNNEST(ARRAY[1,2,3]) AS example_table(d1) WHERE d1 IN ('1','2')
   ```
   Matches the example above



##########
docs/querying/sql.md:
##########
@@ -82,6 +83,43 @@ FROM clause, metadata tables are not considered datasources. 
They exist only in
 For more information about table, lookup, query, and join datasources, refer 
to the [Datasources](datasource.md)
 documentation.
 
+## UNNEST
+
+> The UNNEST SQL function is [experimental](../development/experimental.md). 
Their API and behavior are subject
+> to change in future releases. It is not recommended to use this feature in 
production at this time.
+
+The UNNEST clause unnests array values. It's the SQL equivalent to the [unnest 
datasource](./datasource.md#unnest). The source for UNNEST can be an array or 
an input that's been transformed into an array, such as with helper functions 
like MV_TO_ARRAY or ARRAY.
+
+The following is the general syntax for UNNEST, specifically a query that 
returns the column that gets unnested:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression1) AS 
table_alias_name1(column_alias_name1), UNNEST(source_expression2) AS 
table_alias_name2(column_alias_name2), ...
+```
+
+* The `datasource` for UNNEST can be any Druid datasource, such as the 
following:
+  * A table, such as  `FROM a_table`.
+  * A subset of a table based on a query, a filter, or a JOIN. For example, 
`FROM (SELECT columnA,columnB,columnC from a_table)`.
+* The `source_expression` for the UNNEST function must be an array and can 
come from any expression. If the dimension you are unnesting is a multi-value 
dimension, you have to specify `MV_TO_ARRAY(dimension)` to convert it to an 
implicit ARRAY type. You can also specify any expression that has an SQL array 
datatype. For example, you can call UNNEST on the following:
+  * `ARRAY[dim1,dim2]` if you want to make an array out of two dimensions. 
+  * `ARRAY_CONCAT(dim1,dim2)` if you have to concatenate two multi-value 
dimensions. 
+* The `AS table_alias_name(column_alias_name)` clause  is not required but is 
highly recommended. Use it to specify the output, which can be an existing 
column or a new one. Replace `table_alias_name` and `column_alias_name` with a 
table and column name you want to alias the unnested results to. If you don't 
provide this, Druid uses a nondescriptive name, such as `EXPR$0`.
+
+Keep these two things in mind when writing your query:

Review Comment:
   ```suggestion
   Keep the following things in mind when writing your query:
   ```



##########
docs/tutorials/tutorial-unnest-arrays.md:
##########
@@ -141,12 +145,128 @@ Now that the data is loaded, run the following query:
 SELECT * FROM nested_data
 ```
 
-In the results, notice that the column named `dim3` has nested values like 
`["a","b"]`.  The example queries that follow unnest `dim3`  and run queries, 
such as Scan. 
+In the results, notice that the column named `dim3` has nested values like 
`["a","b"]`.  The example queries that follow unnest `dim3`  and run queries 
against the unnested records. Depending on the type of queries you write, see 
either [Unnest using SQL queries](#unnest-using-sql-queries) or [Unnest using 
native queries](#unnest-using-native-queries).
 
-## Unnest a single column
+## Unnest using SQL queries
+
+The following is the general syntax for UNNEST:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression) AS 
table_alias_name(column_alias_name)
+```
+
+In addition, you must supply the following context parameter:
+
+```json
+"enableUnnest": "true"
+```
+
+For more information about the syntax, see [UNNEST](../querying/sql.md#unnest).
+
+### Unnest a single source expression in a datasource
+
+The following query returns a column called `d3` from the table `nested_data`. 
`d3` contains the unnested values from the source column `dim3`:
+
+```sql
+SELECT d3 FROM "nested_data", UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3) 
+```
+
+Notice the MV_TO_ARRAY helper function, which converts the multi-value records 
in `dim3` to arrays. It is required since `dim3` is a multi-value string 
dimension. 
+
+If the column you are unnesting is not a string dimension, then you do not 
need to use the MV_TO_ARRAY helper function.
+
+### Unnest a virtual column
+
+You can unnest into a virtual column (multiple columns treated as one). The 
following query returns the two source columns and a third virtual column 
containing the unnested data:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY[dim4,dim5]) AS 
example_table(d45)
+```
+
+This virtual column is the product of the two source columns. Notice how the 
total number of rows has grown. The table `nested_data` had only seven rows 
originally.

Review Comment:
   ```suggestion
   The virtual column `d45` is the product of the two source columns. Notice 
how the total number of rows has grown. The table `nested_data` had only seven 
rows originally.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to