vtlim commented on code in PR #12922:
URL: https://github.com/apache/druid/pull/12922#discussion_r950527131
##########
docs/querying/sql-data-types.md:
##########
@@ -112,3 +112,17 @@ When `druid.expressions.useStrictBooleans = false` (the
default mode), Druid use
When `druid.expressions.useStrictBooleans = true`, Druid uses three-valued
logic for
[expressions](../misc/math-expr.md) evaluation, such as `expression` virtual
columns or `expression` filters.
However, even in this mode, Druid uses two-valued logic for filter types other
than `expression`.
+
+## Nested columns
+Druid supports storing nested data structures in segments using the native
`COMPLEX<json>` type. This data can be
+interacted with using [JSON functions](sql-json-functions.md), which can
extract nested values, parse from string,
+serialize to string, and to create new `COMPLEX<json>` structures.
+
+`COMPLEX` types in general currently have limited functionality outside of the
use of the specialized functions which
+understand them, and so have undefined behavior when:
+* grouping on complex values
+* filtered directly on complex values, e.g. `WHERE json is NULL`
+* used as inputs to aggregators without specialized handling for a specific
complex type
+
+In many cases, these functions are provided for translating these `COMPLEX`
value types a `STRING`, which serves as
Review Comment:
```suggestion
In many cases, functions are provided to translate `COMPLEX` value types to
`STRING`, which serves as a
```
##########
docs/querying/sql-functions.md:
##########
@@ -647,6 +647,46 @@ Parses `address` into an IPv4 address stored as an integer.
Converts `address` into an IPv4 address in dot-decimal notation.
+## JSON_KEYS
+
+**Function type:** [JSON](sql-json-functions.md)
+
+`JSON_KEYS(expr, path)`
+
+Returns an array of field names from `expr` at the specified `path`.
+
+## JSON_OBJECT
+
+**Function type:** [JSON](sql-json-functions.md)
+
+`JSON_OBJECT(KEY expr1 VALUE expr2[, KEY expr3 VALUE expr4, ...])`
+
+Constructs a new `COMPLEX<json>` object. The `KEY` expressions must evaluate
to string types. The `VALUE` expressions can be composed of any input type,
including other `COMPLEX<json>` values. `JSON_OBJECT` can accept alternating
key-value pairs separated by colons. The following syntax is equivalent:
`JSON_OBJECT(expr1:expr2[, expr3:expr4, ...])`.
Review Comment:
```suggestion
Constructs a new `COMPLEX<json>` object. The `KEY` expressions must evaluate
to string types. The `VALUE` expressions can be composed of any input type,
including other `COMPLEX<json>` values. `JSON_OBJECT` can accept
colon-separated key-value pairs. The following syntax is equivalent:
`JSON_OBJECT(expr1:expr2[, expr3:expr4, ...])`.
```
Technically the key-value pairs are separated by commas but it might be too
wordy to say "colon-separated" then "separated by commas"
##########
docs/querying/virtual-columns.md:
##########
@@ -80,4 +82,106 @@ The expression virtual column has the following syntax:
|--------|-----------|---------|
|name|The name of the virtual column.|yes|
|expression|An [expression](../misc/math-expr.md) that takes a row as input
and outputs a value for the virtual column.|yes|
-|outputType|The expression's output will be coerced to this type. Can be LONG,
FLOAT, DOUBLE, or STRING.|no, default is FLOAT|
+|outputType|The expression's output will be coerced to this type. Can be LONG,
FLOAT, DOUBLE, STRING, ARRAY types, or COMPLEX types.|no, default is FLOAT|
+
+
+### Nested field virtual column
+
+The nested field virtual column is an optimized virtual column that can
provide direct access into various paths of
+a `COMPLEX<json>` column, including using their indexes.
+
+Syntax (all 3 of these virtual columns produce the same output):
+```json
+ {
+ "type": "nested-field",
+ "columnName": "shipTo",
+ "outputName": "v0",
+ "expectedType": "STRING",
+ "path": "$.phoneNumbers[1].number"
+ }
+```
+```json
+ {
+ "type": "nested-field",
+ "columnName": "shipTo",
+ "outputName": "v1",
+ "expectedType": "STRING",
+ "path": ".phoneNumbers[1].number",
+ "useJqSyntax": true
+ }
+```
+
+```json
+ {
+ "type": "nested-field",
+ "columnName": "shipTo",
+ "outputName": "v2",
+ "expectedType": "STRING",
+ "pathParts": [
+ {
+ "type": "field",
+ "field": "phoneNumbers"
+ },
+ {
+ "type": "arrayElement",
+ "index": 1
+ },
+ {
+ "type": "field",
+ "field": "number"
+ }
+ ]
+ }
+```
+
+|property|description|required?|
+|--------|-----------|---------|
+|columnName|The name of the virtual column.|yes|
+|outputName|The name of the virtual column.|yes|
+|expectedType|The name of the virtual column.|yes|
+|pathParts|The name of the virtual column.|yes|
+|processFromRaw|If set to true, the virtual column will process the "raw" JSON
data to extract values rather than using an optimized "literal" value selector.
This option allows extracting non-literal values (such as nested JSON objects
or arrays) as a `COMPLEX<json>` at the cost of much slower performance.|No,
default false|
+|path|'JSONPath' or 'jq' syntax path. One of `path` or `pathParts` must be
set|no, if `pathParts` is defined|
+|useJqSyntax||no, default is false|
+
+#### Nested path part
+|property|description|required?|
+|--------|-----------|---------|
+|type|Must be 'field' or 'arrayElement'|yes|
+|field|The name of the 'field' in a 'field' `type` path part|yes, if `type` is
'field'|
+|index|The array element index if `type` is `arrayElement`|yes, if `type` is
'arrayElement'|
+
+This virtual column is used for the SQL operators `JSON_VALUE` (if
`processFromRaw` is set to false) or `JSON_QUERY`
Review Comment:
ah true. the helpful part for the intro was more so on how you can use
JSONPath or jq syntax to extract part of the json
##########
docs/querying/sql-data-types.md:
##########
@@ -112,3 +112,17 @@ When `druid.expressions.useStrictBooleans = false` (the
default mode), Druid use
When `druid.expressions.useStrictBooleans = true`, Druid uses three-valued
logic for
[expressions](../misc/math-expr.md) evaluation, such as `expression` virtual
columns or `expression` filters.
However, even in this mode, Druid uses two-valued logic for filter types other
than `expression`.
+
+## Nested columns
+Druid supports storing nested data structures in segments using the native
`COMPLEX<json>` type. This data can be
+interacted with using [JSON functions](sql-json-functions.md), which can
extract nested values, parse from string,
+serialize to string, and to create new `COMPLEX<json>` structures.
Review Comment:
```suggestion
serialize to string, and create new `COMPLEX<json>` structures.
```
##########
docs/querying/sql-data-types.md:
##########
@@ -112,3 +112,17 @@ When `druid.expressions.useStrictBooleans = false` (the
default mode), Druid use
When `druid.expressions.useStrictBooleans = true`, Druid uses three-valued
logic for
[expressions](../misc/math-expr.md) evaluation, such as `expression` virtual
columns or `expression` filters.
However, even in this mode, Druid uses two-valued logic for filter types other
than `expression`.
+
+## Nested columns
+Druid supports storing nested data structures in segments using the native
`COMPLEX<json>` type. This data can be
+interacted with using [JSON functions](sql-json-functions.md), which can
extract nested values, parse from string,
Review Comment:
```suggestion
Druid supports storing nested data structures in segments using the native
`COMPLEX<json>` type. You can interact
with this data using [JSON functions](sql-json-functions.md), which can
extract nested values, parse from string,
```
##########
docs/querying/sql-data-types.md:
##########
@@ -112,3 +112,17 @@ When `druid.expressions.useStrictBooleans = false` (the
default mode), Druid use
When `druid.expressions.useStrictBooleans = true`, Druid uses three-valued
logic for
[expressions](../misc/math-expr.md) evaluation, such as `expression` virtual
columns or `expression` filters.
However, even in this mode, Druid uses two-valued logic for filter types other
than `expression`.
+
+## Nested columns
+Druid supports storing nested data structures in segments using the native
`COMPLEX<json>` type. This data can be
+interacted with using [JSON functions](sql-json-functions.md), which can
extract nested values, parse from string,
+serialize to string, and to create new `COMPLEX<json>` structures.
+
+`COMPLEX` types in general currently have limited functionality outside of the
use of the specialized functions which
+understand them, and so have undefined behavior when:
+* grouping on complex values
+* filtered directly on complex values, e.g. `WHERE json is NULL`
Review Comment:
```suggestion
* filtered directly on complex values, such as `WHERE json is NULL`
```
##########
docs/misc/math-expr.md:
##########
@@ -227,6 +226,34 @@ map((x) -> x + 1, x)
```
in this case, the `x` when evaluating `x + 1` is the lambda argument, thus an
element of the multi-valued column `x`, rather than the column `x` itself.
+
+## JSON functions
+JSON functions provide facilities to extract, transform, and create
`COMPLEX<json>` values.
+
+| function | description |
+|---|---|
+| json_value(expr, path) | Extract a Druid literal (`STRING`, `LONG`,
`DOUBLE`) value from `expr` using JSONPath syntax of `path` |
+| json_query(expr, path) | Extract a `COMPLEX<json>` value from `expr` using
JSONPath syntax of `path` |
+| json_object(expr1, expr2[, expr3, expr4 ...]) | Construct a `COMPLEX<json>`
with alternating 'key' and 'value' arguments|
+| parse_json(expr) | Deserialize a JSON `STRING` into a `COMPLEX<json>`. If
the input is not a `STRING` or it is invalid JSON, this function will result in
an error.|
+| try_parse_json(expr) | Deserialize a JSON `STRING` into a `COMPLEX<json>`.
If the input is not a `STRING` or it is invalid JSON, this function will result
in a `NULL` value. |
+| to_json_string(expr) | Convert `expr` into a JSON `STRING` value |
+| json_keys(expr, path) | get array of field names from `expr` at the
specified JSONPath `path`, or null if the data does not exist or have any
fields |
+| json_paths(expr) | get array of all JSONPath paths available from `expr` |
Review Comment:
```suggestion
| json_keys(expr, path) | Get array of field names from `expr` at the
specified JSONPath `path`, or null if the data does not exist or have any
fields |
| json_paths(expr) | Get array of all JSONPath paths available from `expr` |
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]