This is an automated email from the ASF dual-hosted git repository.
victoria pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new 7e572eef08 docs: sql unnest and cleanup unnest datasource (#13736)
7e572eef08 is described below
commit 7e572eef086cd47607d680e607954e293fecf9c4
Author: 317brian <[email protected]>
AuthorDate: Tue Apr 4 13:07:54 2023 -0700
docs: sql unnest and cleanup unnest datasource (#13736)
Co-authored-by: Elliott Freis
<[email protected]>
Co-authored-by: Charles Smith <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
Co-authored-by: Katya Macedo <[email protected]>
Co-authored-by: Paul Rogers <[email protected]>
Co-authored-by: Jill Osborne <[email protected]>
Co-authored-by: Anshu Makkar
<[email protected]>
Co-authored-by: Abhishek Agarwal
<[email protected]>
Co-authored-by: Elliott Freis
<[email protected]>
Co-authored-by: Nicholas Lippis <[email protected]>
Co-authored-by: Rohan Garg <[email protected]>
Co-authored-by: Karan Kumar <[email protected]>
Co-authored-by: Vadim Ogievetsky <[email protected]>
Co-authored-by: Gian Merlino <[email protected]>
Co-authored-by: Clint Wylie <[email protected]>
Co-authored-by: Adarsh Sanjeev <[email protected]>
Co-authored-by: Laksh Singla <[email protected]>
---
docs/querying/datasource.md | 5 +-
docs/querying/sql-functions.md | 7 +
docs/querying/sql.md | 38 ++
...est-datasource.md => tutorial-unnest-arrays.md} | 386 ++++++++++-----------
website/.spelling | 2 +
website/sidebars.json | 2 +-
6 files changed, 228 insertions(+), 212 deletions(-)
diff --git a/docs/querying/datasource.md b/docs/querying/datasource.md
index e3eacf032b..fd3984cb53 100644
--- a/docs/querying/datasource.md
+++ b/docs/querying/datasource.md
@@ -397,7 +397,7 @@ When you use the `unnest` datasource, the unnested column
looks like this:
When unnesting data, keep the following in mind:
- The total number of rows will grow to accommodate the new rows that the
unnested data occupy.
-- You can unnest the values in more than one column in a single `unnest`
datasource. This can lead to a very large number of new rows depending on your
dataset. You can see an example of this in the [unnest
tutorial](../tutorials/tutorial-unnest-datasource.md#unnest-multiple-columns).
+- You can unnest the values in more than one column in a single `unnest`
datasource, but this can lead to a very large number of new rows depending on
your dataset.
The `unnest` datasource uses the following syntax:
@@ -410,6 +410,7 @@ The `unnest` datasource uses the following syntax:
},
"virtualColumn": {
"type": "expression",
+ "name": "output_column",
"expression": "\"column_reference\""
},
"outputName": "unnested_target_column"
@@ -421,4 +422,4 @@ The `unnest` datasource uses the following syntax:
* `dataSource.base.type`: The type of datasource you want to unnest, such as
a table.
* `dataSource.virtualColumn`: [Virtual column](virtual-columns.md) that
references the nested values. The output name of this column is reused as the
name of the column that contains unnested values. You can replace the source
column with the unnested column by specifying the source column's name or a new
column by specifying a different name. Outputting it to a new column can help
you verify that you get the results that you expect but isn't required.
-To learn more about how to use the `unnest` datasource, see the [unnest
tutorial](../tutorials/tutorial-unnest-datasource.md).
+To learn more about how to use the `unnest` datasource, see the [unnest
tutorial](../tutorials/tutorial-unnest-arrays.md).
diff --git a/docs/querying/sql-functions.md b/docs/querying/sql-functions.md
index a4a2de0944..9f5a82deb1 100644
--- a/docs/querying/sql-functions.md
+++ b/docs/querying/sql-functions.md
@@ -1399,6 +1399,13 @@ Truncates a numerical expression to a specific number of
decimal digits.
Parses `expr` into a `COMPLEX<json>` object. This operator deserializes JSON
values when processing them, translating stringified JSON into a nested
structure. If the input is not a `VARCHAR` or it is invalid JSON, this function
will result in a `NULL` value.
+## UNNEST
+
+`UNNEST(source_expression) as table_alias_name(column_alias_name)`
+
+Unnests a source expression that includes arrays into a target column with an
aliased name.
+
+For more information, see [UNNEST](./sql.md#unnest).
## UPPER
diff --git a/docs/querying/sql.md b/docs/querying/sql.md
index 93f3c4f24f..18c3b84272 100644
--- a/docs/querying/sql.md
+++ b/docs/querying/sql.md
@@ -55,6 +55,7 @@ Druid SQL supports SELECT queries with the following
structure:
[ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]
SELECT [ ALL | DISTINCT ] { * | exprs }
FROM { <table> | (<subquery>) | <o1> [ INNER | LEFT ] JOIN <o2> ON condition }
+[, UNNEST(source_expression) as table_alias_name(column_alias_name) ]
[ WHERE expr ]
[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE
(exprs) ] ]
[ HAVING expr ]
@@ -82,6 +83,43 @@ FROM clause, metadata tables are not considered datasources.
They exist only in
For more information about table, lookup, query, and join datasources, refer
to the [Datasources](datasource.md)
documentation.
+## UNNEST
+
+> The UNNEST SQL function is [experimental](../development/experimental.md).
Its API and behavior are subject
+> to change in future releases. It is not recommended to use this feature in
production at this time.
+
+The UNNEST clause unnests array values. It's the SQL equivalent to the [unnest
datasource](./datasource.md#unnest). The source for UNNEST can be an array or
an input that's been transformed into an array, such as with helper functions
like MV_TO_ARRAY or ARRAY.
+
+The following is the general syntax for UNNEST, specifically a query that
returns the column that gets unnested:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression1) AS
table_alias_name1(column_alias_name1), UNNEST(source_expression2) AS
table_alias_name2(column_alias_name2), ...
+```
+
+* The `datasource` for UNNEST can be any Druid datasource, such as the
following:
+ * A table, such as `FROM a_table`.
+ * A subset of a table based on a query, a filter, or a JOIN. For example,
`FROM (SELECT columnA,columnB,columnC from a_table)`.
+* The `source_expression` for the UNNEST function must be an array and can
come from any expression. If the dimension you are unnesting is a multi-value
dimension, you have to specify `MV_TO_ARRAY(dimension)` to convert it to an
implicit ARRAY type. You can also specify any expression that has an SQL array
datatype. For example, you can call UNNEST on the following:
+ * `ARRAY[dim1,dim2]` if you want to make an array out of two dimensions.
+ * `ARRAY_CONCAT(dim1,dim2)` if you want to concatenate two multi-value
dimensions.
+* The `AS table_alias_name(column_alias_name)` clause is not required but is
highly recommended. Use it to specify the output, which can be an existing
column or a new one. Replace `table_alias_name` and `column_alias_name` with a
table and column name you want to alias the unnested results to. If you don't
provide this, Druid uses a nondescriptive name, such as `EXPR$0`.
+
+Keep the following things in mind when writing your query:
+
+- You must include the context parameter `"enableUnnest": true`.
+- You can unnest multiple source expressions in a single query.
+- Notice the comma between the datasource and the UNNEST function. This is
needed in most cases of the UNNEST function. Specifically, it is not needed
when you're unnesting an inline array since the array itself is the datasource.
+- If you view the native explanation of a SQL UNNEST, you'll notice that Druid
uses `j0.unnest` as a virtual column to perform the unnest. An underscore is
added for each unnest, so you may notice virtual columns named `_j0.unnest` or
`__j0.unnest`.
+- UNNEST preserves the ordering of the source array that is being unnested.
+
+For examples, see the [Unnest arrays
tutorial](../tutorials/tutorial-unnest-arrays.md).
+
+The UNNEST function has the following limitations:
+
+- The function does not remove any duplicates or nulls in an array. Nulls will
be treated as any other value in an array. If there are multiple nulls within
the array, a record corresponding to each of the nulls gets created.
+- Arrays inside complex JSON types are not supported.
+- You cannot perform an UNNEST at ingestion time, including SQL-based
ingestion using the MSQ task engine.
+
## WHERE
The WHERE clause refers to columns in the FROM table, and will be translated
to [native filters](filters.md). The
diff --git a/docs/tutorials/tutorial-unnest-datasource.md
b/docs/tutorials/tutorial-unnest-arrays.md
similarity index 57%
rename from docs/tutorials/tutorial-unnest-datasource.md
rename to docs/tutorials/tutorial-unnest-arrays.md
index 95769a4f69..954142f4fa 100644
--- a/docs/tutorials/tutorial-unnest-datasource.md
+++ b/docs/tutorials/tutorial-unnest-arrays.md
@@ -1,7 +1,7 @@
---
-id: tutorial-unnest-datasource
-sidebar_label: "Using the unnest datasource"
-title: "Tutorial: Unnest data in a column"
+id: tutorial-unnest-arrays
+sidebar_label: "Unnesting arrays"
+title: "Unnest arrays within a column"
---
<!--
@@ -25,7 +25,7 @@ title: "Tutorial: Unnest data in a column"
> If you're looking for information about how to unnest `COMPLEX<json>`
> columns, see [Nested columns](../querying/nested-columns.md).
-> The unnest datasource is [experimental](../development/experimental.md). Its
API and behavior are subject
+> The unnest datasource and UNNEST SQL function are
[experimental](../development/experimental.md). Their API and behavior are
subject
> to change in future releases. It is not recommended to use this feature in
> production at this time.
This tutorial demonstrates how to use the unnest datasource to unnest a column
that has data stored in arrays. For example, if you have a column named `dim3`
with values like `[a,b]` or `[c,d,f]`, the unnest datasource can output the
data to a new column with individual rows that contain single values like `a`
and `b`. When doing this, be mindful of the following:
@@ -37,14 +37,14 @@ You can use the Druid console or API to unnest data. To
start though, you may w
## Prerequisites
-You need a Druid cluster, such as the [micro-quickstart](./index.md). The
cluster does not need any existing datasources. You'll load a basic one as part
of this tutorial.
+You need a Druid cluster, such as the [quickstart](./index.md). The cluster
does not need any existing datasources. You'll load a basic one as part of this
tutorial.
## Load data with nested values
The data you're ingesting contains a handful of rows that resemble the
following:
```
-t:2000-01-01, m1:1.0, m2:1.0, dim1:, dim2:[a], dim3:[a,b]
+t:2000-01-01, m1:1.0, m2:1.0, dim1:, dim2:[a], dim3:[a,b], dim4:[x,y],
dim5:[a,b]
```
The focus of this tutorial is on the nested array of values in `dim3`.
@@ -62,13 +62,15 @@ SELECT
dim1,
dim2,
dim3,
+ dim4,
+ dim5,
m1,
m2
FROM TABLE(
EXTERN(
-
'{"type":"inline","data":"{\"t\":\"2000-01-01\",\"m1\":\"1.0\",\"m2\":\"1.0\",\"dim1\":\"\",\"dim2\":[\"a\"],\"dim3\":[\"a\",\"b\"]},\n{\"t\":\"2000-01-02\",\"m1\":\"2.0\",\"m2\":\"2.0\",\"dim1\":\"10.1\",\"dim2\":[],\"dim3\":[\"c\",\"d\"]},\n{\"t\":\"2000-01-03\",\"m1\":\"3.0\",\"m2\":\"3.0\",\"dim1\":\"2\",\"dim2\":[\"\"],\"dim3\":[\"e\",\"f\"]},\n{\"t\":\"2001-01-01\",\"m1\":\"4.0\",\"m2\":\"4.0\",\"dim1\":\"1\",\"dim2\":[\"a\"],\"dim3\":[\"g\",\"h\"]},\n{\"t\":\"2001-01-02\",\"m1
[...]
+
'{"type":"inline","data":"{\"t\":\"2000-01-01\",\"m1\":\"1.0\",\"m2\":\"1.0\",\"dim1\":\"\",\"dim2\":[\"a\"],\"dim3\":[\"a\",\"b\"],\"dim4\":[\"x\",\"y\"],\"dim5\":[\"a\",\"b\"]},\n{\"t\":\"2000-01-02\",\"m1\":\"2.0\",\"m2\":\"2.0\",\"dim1\":\"10.1\",\"dim2\":[],\"dim3\":[\"c\",\"d\"],\"dim4\":[\"e\",\"f\"],\"dim5\":[\"a\",\"b\",\"c\",\"d\"]},\n{\"t\":\"2001-01-03\",\"m1\":\"6.0\",\"m2\":\"6.0\",\"dim1\":\"abc\",\"dim2\":[\"a\"],\"dim3\":[\"k\",\"l\"]},\n{\"t\":\"2001-01-01\",\"m1\":
[...]
'{"type":"json"}',
-
'[{"name":"t","type":"string"},{"name":"dim1","type":"string"},{"name":"dim2","type":"string"},{"name":"dim3","type":"string"},{"name":"m1","type":"float"},{"name":"m2","type":"double"}]'
+
'[{"name":"t","type":"string"},{"name":"dim1","type":"string"},{"name":"dim2","type":"string"},{"name":"dim3","type":"string"},{"name":"dim4","type":"string"},{"name":"dim5","type":"string"},{"name":"m1","type":"float"},{"name":"m2","type":"double"}]'
)
)
PARTITIONED BY YEAR
@@ -84,7 +86,7 @@ PARTITIONED BY YEAR
"type": "index_parallel",
"inputSource": {
"type": "inline",
- "data":
"{\"t\":\"2000-01-01\",\"m1\":\"1.0\",\"m2\":\"1.0\",\"dim1\":\"\",\"dim2\":[\"a\"],\"dim3\":[\"a\",\"b\"]},\n{\"t\":\"2000-01-02\",\"m1\":\"2.0\",\"m2\":\"2.0\",\"dim1\":\"10.1\",\"dim2\":[],\"dim3\":[\"c\",\"d\"]},\n{\"t\":\"2000-01-03\",\"m1\":\"3.0\",\"m2\":\"3.0\",\"dim1\":\"2\",\"dim2\":[\"\"],\"dim3\":[\"e\",\"f\"]},\n{\"t\":\"2001-01-01\",\"m1\":\"4.0\",\"m2\":\"4.0\",\"dim1\":\"1\",\"dim2\":[\"a\"],\"dim3\":[\"g\",\"h\"]},\n{\"t\":\"2001-01-02\",\"m1\":\"5.0\",\"
[...]
+
"data":"{\"t\":\"2000-01-01\",\"m1\":\"1.0\",\"m2\":\"1.0\",\"dim1\":\"\",\"dim2\":[\"a\"],\"dim3\":[\"a\",\"b\"],\"dim4\":[\"x\",\"y\"],\"dim5\":[\"a\",\"b\"]},\n{\"t\":\"2000-01-02\",\"m1\":\"2.0\",\"m2\":\"2.0\",\"dim1\":\"10.1\",\"dim2\":[],\"dim3\":[\"c\",\"d\"],\"dim4\":[\"e\",\"f\"],\"dim5\":[\"a\",\"b\",\"c\",\"d\"]},\n{\"t\":\"2001-01-03\",\"m1\":\"6.0\",\"m2\":\"6.0\",\"dim1\":\"abc\",\"dim2\":[\"a\"],\"dim3\":[\"k\",\"l\"]},\n{\"t\":\"2001-01-01\",\"m1\":\"4.0\",\"m2\"
[...]
},
"inputFormat": {
"type": "json"
@@ -112,7 +114,9 @@ PARTITIONED BY YEAR
"dimensions": [
"dim1",
"dim2",
- "dim3"
+ "dim3",
+ "dim4",
+ "dim5"
]
},
"metricsSpec": [
@@ -141,12 +145,128 @@ Now that the data is loaded, run the following query:
SELECT * FROM nested_data
```
-In the results, notice that the column named `dim3` has nested values like
`["a","b"]`. The example queries that follow unnest `dim3` and run queries,
such as Scan.
+In the results, notice that the column named `dim3` has nested values like
`["a","b"]`. The example queries that follow unnest `dim3` and run queries
against the unnested records. Depending on the type of queries you write, see
either [Unnest using SQL queries](#unnest-using-sql-queries) or [Unnest using
native queries](#unnest-using-native-queries).
-## Unnest a single column
+## Unnest using SQL queries
+
+The following is the general syntax for UNNEST:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression) AS
table_alias_name(column_alias_name)
+```
+
+In addition, you must supply the following context parameter:
+
+```json
+"enableUnnest": "true"
+```
+
+For more information about the syntax, see [UNNEST](../querying/sql.md#unnest).
+
+### Unnest a single source expression in a datasource
+
+The following query returns a column called `d3` from the table `nested_data`.
`d3` contains the unnested values from the source column `dim3`:
+
+```sql
+SELECT d3 FROM "nested_data", UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
+```
+
+Notice the MV_TO_ARRAY helper function, which converts the multi-value records
in `dim3` to arrays. It is required since `dim3` is a multi-value string
dimension.
+
+If the column you are unnesting is not a string dimension, then you do not
need to use the MV_TO_ARRAY helper function.
+
+### Unnest a virtual column
+
+You can unnest into a virtual column (multiple columns treated as one). The
following query returns the two source columns and a third virtual column
containing the unnested data:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY[dim4,dim5]) AS
example_table(d45)
+```
+
+The virtual column `d45` is the product of the two source columns. Notice how
the total number of rows has grown. The table `nested_data` had only seven rows
originally.
+
+Another way to unnest a virtual column is to concatenate them with
ARRAY_CONCAT:
+
+```sql
+SELECT dim4,dim5,d45 FROM nested_data, UNNEST(ARRAY_CONCAT(dim4,dim5)) AS
example_table(d45)
+```
+
+Decide which method to use based on what your goals are.
+
+### Unnest multiple source expressions
+
+You can include multiple UNNEST clauses in a single query. Each `UNNEST`
clause needs the following:
+
+```sql
+UNNEST(source_expression) AS table_alias_name(column_alias_name)
+```
+
+The `table_alias_name` and `column_alias_name` for each UNNEST clause should
be unique.
+
+The example query returns the following from the `nested_data` datasource:
+
+- the source columns `dim3`, `dim4`, and `dim5`
+- an unnested version of `dim3` aliased to `d3`
+- an unnested virtual column composed of `dim4` and `dim5` aliased to `d45`
+
+```sql
+SELECT dim3,dim4,dim5,d3,d45 FROM "nested_data", UNNEST(MV_TO_ARRAY("dim3"))
AS foo1(d3), UNNEST(ARRAY[dim4,dim5]) AS foo2(d45)
+```
+
+
+### Unnest a column from a subset of a table
+
+The following query uses only three columns from the `nested_data` table as
the datasource. From that subset, it unnests the column `dim3` into `d3` and
returns `d3`.
+
+```sql
+SELECT d3 FROM (SELECT dim1, dim2, dim3 FROM "nested_data"),
UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
+```
+
+### Unnest with a filter
+
+You can specify which rows to unnest by including a filter in your query. The
following query:
+
+* Filters the source expression based on `dim2`
+* Unnests the records in `dim3` into `d3`
+* Returns the records for the unnested `d3` that have a `dim2` record that
matches the filter
+
+```sql
+SELECT d3 FROM (SELECT * FROM nested_data WHERE dim2 IN ('abc')),
UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
+```
+
+You can also filter the results of an UNNEST clause. The following example
unnests the inline array `[1,2,3]` but only returns the rows that match the
filter:
+
+```sql
+SELECT * FROM UNNEST(ARRAY[1,2,3]) AS example_table(d1) WHERE d1 IN ('1','2')
+```
+
+This means that you can run a query like the following where Druid only return
rows that meet the following conditions:
+
+- The unnested values of `dim3` (aliased to `d3`) matches `IN ('b', 'd')`
+- The value of `m1` is less than 2.
+
+```sql
+SELECT * FROM nested_data, UNNEST(MV_TO_ARRAY("dim3")) AS foo(d3) WHERE d3 IN
('b', 'd') and m1 < 2
+```
+
+The query only returns a single row since only one row meets the conditions.
You can see the results change if you modify the filter.
+
+### Unnest and then GROUP BY
+
+The following query unnests `dim3` and then performs a GROUP BY on the output
`d3`.
+
+```sql
+SELECT d3 FROM nested_data, UNNEST(MV_TO_ARRAY(dim3)) AS example_table(d3)
GROUP BY d3
+```
+
+You can further transform your results by including clauses like `ORDER BY d3
DESC` or LIMIT.
+
+## Unnest using native queries
The following section shows examples of how you can use the unnest datasource
in queries. They all use the `nested_data` table you created earlier in the
tutorial.
+You can use a single unnest datasource to unnest multiple columns. Be careful
when doing this though because it can lead to a very large number of new rows.
+
### Scan query
The following native Scan query returns the rows of the datasource and unnests
the values in the `dim3` column by using the `unnest` datasource type:
@@ -198,9 +318,7 @@ The following native Scan query returns the rows of the
datasource and unnests t
In the results, notice that there are more rows than before and an additional
column named `unnest-dim3`. The values of `unnest-dim3` are the same as the
`dim3` column except the nested values are no longer nested and are each a
separate record.
-With the `dataSource.allowList` parameter, you can unnest a subset of a
column. Set the value of `allowList` to `["a","b"]` and run the query again.
Only a subset of rows are returned based on the values you allowed.
-
-You can also implement filters. For example, you can add the following to the
Scan query to filter results to only rows that have the values `"a"` or `"abc"`
in `"dim2"`:
+You can implement filters. For example, you can add the following to the Scan
query to filter results to only rows that have the values `"a"` or `"abc"` in
`"dim2"`:
```json
"filter": {
@@ -274,7 +392,6 @@ The example topN query unnests `dim3` into the column
`unnest-dim3`. The query u
"name": "unnest-dim3",
"expression": "\"dim3\""
},
- "allowList": null
},
"dimension": {
"type": "default",
@@ -419,143 +536,41 @@ This query joins the `nested_data` table with itself and
outputs the unnested da
</details>
-## Unnest multiple columns
-
-You can use a single unnest datasource to unnest multiple columns. Be careful
when doing this though because it can lead to a very large number of new rows.
-
-### Load data with two columns of nested values
-
-Ingest this new data into a table called `nested_data2`:
-
-<!--DOCUSAURUS_CODE_TABS-->
-
-<!--SQL-based ingestion-->
-
-```sql
-REPLACE INTO "nested_data2" OVERWRITE ALL
-SELECT
- TIME_PARSE("t") as __time,
- "dim1",
- "dim2",
- "dim3",
- "m1",
- "m2"
-FROM TABLE(
- EXTERN(
-
'{"type":"inline","data":"{\"t\":\"2000-01-01\",\"m1\":\"1.0\",\"m2\":\"1.0\",\"dim1\":\"\",\"dim2\":[\"x\",\"y\"],\"dim3\":[\"a\",\"b\"]},\n{\"t\":\"2000-01-02\",\"m1\":\"2.0\",\"m2\":\"2.0\",\"dim1\":\"10.1\",\"dim2\":[\"e\",\"f\"],\"dim3\":[\"a\",\"b\",\"c\",\"d\"]}"}',
- '{"type":"json"}',
-
'[{"name":"t","type":"string"},{"name":"dim1","type":"string"},{"name":"dim2","type":"string"},{"name":"dim3","type":"string"},{"name":"m1","type":"float"},{"name":"m2","type":"double"}]'
- )
-)
-PARTITIONED BY YEAR
-```
-
-<!--Ingestion spec-->
-
-```json
-{
- "type": "index_parallel",
- "spec": {
- "ioConfig": {
- "type": "index_parallel",
- "inputSource": {
- "type": "inline",
- "data":
"{\"t\":\"2000-01-01\",\"m1\":\"1.0\",\"m2\":\"1.0\",\"dim1\":\"\",\"dim2\":[\"x\",\"y\"],\"dim3\":[\"a\",\"b\"]},\n{\"t\":\"2000-01-02\",\"m1\":\"2.0\",\"m2\":\"2.0\",\"dim1\":\"10.1\",\"dim2\":[\"e\",\"f\"],\"dim3\":[\"a\",\"b\",\"c\",\"d\"]}"
- },
- "inputFormat": {
- "type": "json"
- }
- },
- "tuningConfig": {
- "type": "index_parallel",
- "partitionsSpec": {
- "type": "dynamic"
- }
- },
- "dataSchema": {
- "dataSource": "nested_data2",
- "granularitySpec": {
- "type": "uniform",
- "queryGranularity": "NONE",
- "rollup": false,
- "segmentGranularity": "YEAR"
- },
- "timestampSpec": {
- "column": "t",
- "format": "auto"
- },
- "dimensionsSpec": {
- "dimensions": [
- "dim1",
- "dim2",
- "dim3"
- ]
- },
- "metricsSpec": [
- {
- "name": "m1",
- "type": "floatSum",
- "fieldName": "m1"
- },
- {
- "name": "m2",
- "type": "doubleSum",
- "fieldName": "m2"
- }
- ]
- }
- }
-}
-```
-
-<!--END_DOCUSAURUS_CODE_TABS-->
+### Unnest a virtual column
+The `unnest` datasource supports unnesting virtual columns, which is a
queryable composite column that can draw data from multiple source columns.
-### Unnest `nested_data2`
-
-The following query performs two unnests. It unnests `dim3` into a column
named `unnest-dim3`. It also performs an unnest on `dim2` and outputs the
results to `unnest-dim2`. You can then treat the combination of `unnest-dim3`
and `unnest-dim2` as Cartesian products.
-
-When you run the query, pay special attention to how the total number of rows
has grown drastically. The source data has 2 rows. The unnested data has 12
rows, (2 x 2) + (2 x 4).
+The following query returns the columns `dim45` and `m1`. The `dim45` column
is the unnested version of a virtual column that contains an array of the
`dim4` and `dim5` columns.
<details><summary>Show the query</summary>
```json
{
"queryType": "scan",
- "dataSource": {
+ "dataSource":{
"type": "unnest",
"base": {
- "type": "unnest",
- "base": {
- "type": "table",
- "name": "nested_data2"
- },
+ "type": "table",
+ "name": "nested_data"
+ },
"virtualColumn": {
"type": "expression",
- "name": "unnest-dim3",
- "expression": "\"dim3\""
- },
- "allowList": []
+ "name": "dim45",
+ "expression": "array_concat(\"dim4\",\"dim5\")",
+ "outputType": "ARRAY<STRING>"
},
- "column": "dim2",
- "outputName": "unnest-dim2"
- },
+ }
"intervals": {
"type": "intervals",
"intervals": [
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
]
},
- "limit": 1000,
+ "resultFormat": "compactedList",
+ "limit": 1001,
"columns": [
- "__time",
- "dim1",
- "dim2",
- "dim3",
- "m1",
- "m2",
- "unnest-dim3",
- "unnest-dim2"
+ "dim45",
+ "m1"
],
"legacy": false,
"granularity": {
@@ -570,9 +585,9 @@ When you run the query, pay special attention to how the
total number of rows ha
</details>
-## Unnest inline datasource
+### Unnest a column and a virtual column
-You can also use the `unnest` datasource to unnest an inline datasource. The
following query takes the row `[1,2,3]` in the column `inline_data` that is
provided inline within the query and returns it as unnested values in the
`output` column:
+The following Scan query unnests the column `dim3` into `d3` and a virtual
column composed of `dim4` and `dim5` into the column `d45`. It then returns
those source columns and their unnested variants.
<details><summary>Show the query</summary>
@@ -582,96 +597,47 @@ You can also use the `unnest` datasource to unnest an
inline datasource. The fol
"dataSource": {
"type": "unnest",
"base": {
- "type": "inline",
- "columnNames": [
- "inline_data"
- ],
- "columnTypes": [
- "long_array"
- ],
- "rows": [
- [
- [1,2,3]
- ]
- ]
- },
- "virtualColumn": {
- "type": "expression",
- "name": "output",
- "expression": "\"inline_data\""
- }
- },
- "intervals": {
- "type": "intervals",
- "intervals": [
- "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
- ]
- },
- "resultFormat": "compactedList",
- "limit": 1001,
- "columns": [
- "inline_data",
- "output"
- ],
- "legacy": false,
- "granularity": {
- "type": "all"
- }
-}
-```
-
-</details>
-
-## Unnest a virtual column
-
-The `unnest` datasource supports unnesting a virtual columns, which is a
queryable composite column that can draw data from multiple source columns.
-
-The following Scan query uses the `nested_data2` table you created in [Load
data with two columns of nested
values](#load-data-with-two-columns-of-nested-values). It returns the columns
`unnest-v0` and `m1`. The `unnest-v0` column is the unnested version of the
virtual column `v0`, which contains an array of the `dim2` and `dim3` columns.
-
-<details><summary>Show the query</summary>
+ "type": "unnest",
+ "base": {
+ "type": "table",
+ "name": "nested_data"
+ },
-```json
-{
- "queryType": "scan",
- "dataSource":{
- "type": "unnest",
- "base": {
- "type": "table",
- "name": "nested_data2"
+"virtualColumn": {
+ "type": "expression",
+ "name": "d3",
+ "expression": "\"dim3\"",
+ "outputType": "STRING"
+ },
},
"virtualColumn": {
"type": "expression",
- "name": "unnest-v0",
- "expression": "\"v0\""
- }
- }
+ "name": "d45",
+ "expression": "array(\"dim4\",\"dim5\")",
+ "outputType": "ARRAY<STRING>"
+ },
+ },
"intervals": {
"type": "intervals",
"intervals": [
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
]
},
- "virtualColumns": [
- {
- "type": "expression",
- "name": "v0",
- "expression": "array(\"dim2\",\"dim3\")",
- "outputType": "ARRAY<STRING>"
- }
- ],
"resultFormat": "compactedList",
"limit": 1001,
"columns": [
- "unnest-v0",
- "m1"
+ "dim3",
+ "d3",
+ "dim4",
+ "dim5",
+ "d45"
],
"legacy": false,
"context": {
- "populateCache": false,
- "queryId": "d273facb-08cc-4de7-ac0b-d0b82173e531",
+ "enableUnnest": "true",
+ "queryId": "2618b9ce-6c0d-414e-b88d-16fb59b9c481",
"sqlOuterLimit": 1001,
- "sqlQueryId": "d273facb-08cc-4de7-ac0b-d0b82173e531",
- "useCache": false,
+ "sqlQueryId": "2618b9ce-6c0d-414e-b88d-16fb59b9c481",
"useNativeQueryExplain": true
},
"granularity": {
@@ -684,4 +650,6 @@ The following Scan query uses the `nested_data2` table you
created in [Load data
## Learn more
-For more information about `unnest` and other datasources, see
[Datasources](../querying/datasource.md#unnest).
\ No newline at end of file
+For more information, see the following:
+- [UNNEST SQL function](../querying/sql.md#unnest)
+- [`unnest` in Datasources](../querying/datasource.md#unnest)
diff --git a/website/.spelling b/website/.spelling
index 0e4d255dd2..bf846ba207 100644
--- a/website/.spelling
+++ b/website/.spelling
@@ -170,6 +170,7 @@ Montréal
MSQ
Murmur3
MVCC
+MV_TO_ARRAY
NFS
OCF
OIDC
@@ -378,6 +379,7 @@ namespaced
namespaces
natively
netflow
+nondescriptive
nonfinalized
non-null
non-nullable
diff --git a/website/sidebars.json b/website/sidebars.json
index c3bd2afcf1..fbb6bf0866 100644
--- a/website/sidebars.json
+++ b/website/sidebars.json
@@ -25,7 +25,7 @@
"tutorials/tutorial-msq-convert-spec",
"tutorials/tutorial-jdbc",
"tutorials/tutorial-sql-query-view",
- "tutorials/tutorial-unnest-datasource",
+ "tutorials/tutorial-unnest-arrays",
"tutorials/tutorial-jupyter-index",
"tutorials/tutorial-jdbc"
],
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]