somu-imply commented on code in PR #13736:
URL: https://github.com/apache/druid/pull/13736#discussion_r1100675117
##########
docs/querying/sql.md:
##########
@@ -82,6 +83,29 @@ FROM clause, metadata tables are not considered datasources.
They exist only in
For more information about table, lookup, query, and join datasources, refer
to the [Datasources](datasource.md)
documentation.
+## UNNEST
+
+The UNNEST clause unnests array values. It's the SQL equivalent to the [unnest
datasource](./datasource.md#unnest). The source for UNNEST can be an array or
an input that's been transformed into an array, such as with helper functions
like MV_TO_ARRAY or ARRAY.
+
+The following is the general syntax for UNNEST, specifically a query that
returns the column that gets unnested:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression) AS
table_alias_name(column_alias_name)
+```
+
+* The `datasource` for UNNEST can be any Druid datasource, such as the
following:
+ * A table, such as `FROM a_table`.
+ * A subset of a table based on a query, a filter, or a JOIN. For example,
`FROM (SELECT columnA,columnB,columnC from a_table)`.
+ * An inline array, which is treated as the `datasource` and the
`source_expression`, such as `FROM UNNEST(ARRAY[1,2,3])`.
Review Comment:
If we want to unnest a constant inline expression like [1,2,3] we do not
need the datasource. The query comes up as `select * FROM
UNNEST(ARRAY[1,2,3])`. We need to make the distinction that Unnest can be used
independently if you are operating on an explicit inline data source
##########
docs/querying/sql.md:
##########
@@ -55,6 +55,7 @@ Druid SQL supports SELECT queries with the following
structure:
[ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]
SELECT [ ALL | DISTINCT ] { * | exprs }
FROM { <table> | (<subquery>) | <o1> [ INNER | LEFT ] JOIN <o2> ON condition }
+[, UNNEST(source_expression) as table_alias_name(column_alias_name) ]
Review Comment:
This is the right one, we should do the previous thing the same name
<table_alias_name>
##########
docs/querying/sql-functions.md:
##########
@@ -1357,6 +1357,13 @@ Truncates a numerical expression to a specific number of
decimal digits.
Parses `expr` into a `COMPLEX<json>` object. This operator deserializes JSON
values when processing them, translating stringified JSON into a nested
structure. If the input is not a `VARCHAR` or it is invalid JSON, this function
will result in a `NULL` value.
+## UNNEST
+
+`UNNEST(source)) as UNNESTED (target)`
Review Comment:
I am not a fan of calling the table alias as `UNNESTED` always, this can be
anything the user wants
##########
docs/querying/sql.md:
##########
@@ -82,6 +83,29 @@ FROM clause, metadata tables are not considered datasources.
They exist only in
For more information about table, lookup, query, and join datasources, refer
to the [Datasources](datasource.md)
documentation.
+## UNNEST
+
+The UNNEST clause unnests array values. It's the SQL equivalent to the [unnest
datasource](./datasource.md#unnest). The source for UNNEST can be an array or
an input that's been transformed into an array, such as with helper functions
like MV_TO_ARRAY or ARRAY.
+
+The following is the general syntax for UNNEST, specifically a query that
returns the column that gets unnested:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression) AS
table_alias_name(column_alias_name)
+```
+
+* The `datasource` for UNNEST can be any Druid datasource, such as the
following:
+ * A table, such as `FROM a_table`.
+ * A subset of a table based on a query, a filter, or a JOIN. For example,
`FROM (SELECT columnA,columnB,columnC from a_table)`.
+ * An inline array, which is treated as the `datasource` and the
`source_expression`, such as `FROM UNNEST(ARRAY[1,2,3])`.
+* The `source_expression` for the UNNEST function must be an array and can
come from any expression. If the dimension you are unnesting is a multi-value
dimension, you have to specify `MV_TO_ARRAY(dimension)` to convert it to an
implicit ARRAY type. You can also specify any expression that has an SQL array
datatype. For example, you cancall UNNEST on the following:
+ * `ARRAY[dim1,dim2]` if you want to make an array out of two dimensions.
+ * `ARRAY_CONCAT(dim1,dim2)` if you have to concatenate two multi-value
dimensions.
+* The `AS table_alias_name(column_alias_name)` clause is not required but is
highly recommended. Use it to specify the output, which can be an existing
column or a new one. Replace `table_alias_name` and `column_alias_name` with a
table and column name you want to alias the unnested results to. If you don't
provide this, Druid uses a nondescriptive name, such as `EXPR$0`.
+
+Notice the comma between the datasource and the UNNEST function. This is
needed in most cases of the UNNEST function. Specifically, it is not needed
when you're unnesting an inline array since the array itself is the datasource.
Review Comment:
This is correct the same should be reflected in the syntax correctly
##########
docs/querying/sql-functions.md:
##########
@@ -1357,6 +1357,13 @@ Truncates a numerical expression to a specific number of
decimal digits.
Parses `expr` into a `COMPLEX<json>` object. This operator deserializes JSON
values when processing them, translating stringified JSON into a nested
structure. If the input is not a `VARCHAR` or it is invalid JSON, this function
will result in a `NULL` value.
+## UNNEST
+
+`UNNEST(source)) as UNNESTED (target)`
Review Comment:
Here are a couple of things on SQL unnest:
1. The unnest sql function does not remove any duplicates/nulls in an array.
Nulls will be treated as any other value in an array. If there are multiple
nulls within the array, a record corresponding to each of the null will be
created
2. The native unnest has an option of specifying an allowList which cannot
be specified through the SQL counterpart
3. Unnest does not work on arrays inside complex JSON types yet.
4. Unnest cannot be used at ingestion time
5. Unnest preserves the ordering in the array which is being unnested
6. Unnest is not supported in MSQ yet (some work needs to be done there)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]