vtlim commented on code in PR #12922:
URL: https://github.com/apache/druid/pull/12922#discussion_r950527131


##########
docs/querying/sql-data-types.md:
##########
@@ -112,3 +112,17 @@ When `druid.expressions.useStrictBooleans = false` (the 
default mode), Druid use
 When `druid.expressions.useStrictBooleans = true`, Druid uses three-valued 
logic for
 [expressions](../misc/math-expr.md) evaluation, such as `expression` virtual 
columns or `expression` filters.
 However, even in this mode, Druid uses two-valued logic for filter types other 
than `expression`.
+
+## Nested columns
+Druid supports storing nested data structures in segments using the native 
`COMPLEX<json>` type. This data can be
+interacted with using [JSON functions](sql-json-functions.md), which can 
extract nested values, parse from string,
+serialize to string, and to create new `COMPLEX<json>` structures.
+
+`COMPLEX` types in general currently have limited functionality outside of the 
use of the specialized functions which
+understand them, and so have undefined behavior when:
+* grouping on complex values
+* filtered directly on complex values, e.g. `WHERE json is NULL`
+* used as inputs to aggregators without specialized handling for a specific 
complex type
+
+In many cases, these functions are provided for translating these `COMPLEX` 
value types a `STRING`, which serves as

Review Comment:
   ```suggestion
   In many cases, functions are provided to translate `COMPLEX` value types to 
`STRING`, which serves as a
   ```



##########
docs/querying/sql-functions.md:
##########
@@ -647,6 +647,46 @@ Parses `address` into an IPv4 address stored as an integer.
 
 Converts `address` into an IPv4 address in dot-decimal notation.
 
+## JSON_KEYS
+
+**Function type:** [JSON](sql-json-functions.md)
+
+`JSON_KEYS(expr, path)`
+
+Returns an array of field names from `expr` at the specified `path`.
+
+## JSON_OBJECT
+
+**Function type:** [JSON](sql-json-functions.md)
+
+`JSON_OBJECT(KEY expr1 VALUE expr2[, KEY expr3 VALUE expr4, ...])`
+
+Constructs a new `COMPLEX<json>` object. The `KEY` expressions must evaluate 
to string types. The `VALUE` expressions can be composed of any input type, 
including other `COMPLEX<json>` values. `JSON_OBJECT` can accept alternating 
key-value pairs separated by colons. The following syntax is equivalent: 
`JSON_OBJECT(expr1:expr2[, expr3:expr4, ...])`.

Review Comment:
   ```suggestion
   Constructs a new `COMPLEX<json>` object. The `KEY` expressions must evaluate 
to string types. The `VALUE` expressions can be composed of any input type, 
including other `COMPLEX<json>` values. `JSON_OBJECT` can accept 
colon-separated key-value pairs. The following syntax is equivalent: 
`JSON_OBJECT(expr1:expr2[, expr3:expr4, ...])`.
   ```
   Technically the key-value pairs are separated by commas but it might be too 
wordy to say "colon-separated" then "separated by commas"



##########
docs/querying/virtual-columns.md:
##########
@@ -80,4 +82,106 @@ The expression virtual column has the following syntax:
 |--------|-----------|---------|
 |name|The name of the virtual column.|yes|
 |expression|An [expression](../misc/math-expr.md) that takes a row as input 
and outputs a value for the virtual column.|yes|
-|outputType|The expression's output will be coerced to this type. Can be LONG, 
FLOAT, DOUBLE, or STRING.|no, default is FLOAT|
+|outputType|The expression's output will be coerced to this type. Can be LONG, 
FLOAT, DOUBLE, STRING, ARRAY types, or COMPLEX types.|no, default is FLOAT|
+
+
+### Nested field virtual column
+
+The nested field virtual column is an optimized virtual column that can 
provide direct access into various paths of
+a `COMPLEX<json>` column, including using their indexes.
+
+Syntax (all 3 of these virtual columns produce the same output):
+```json
+    {
+      "type": "nested-field",
+      "columnName": "shipTo",
+      "outputName": "v0",
+      "expectedType": "STRING",
+      "path": "$.phoneNumbers[1].number"
+    }
+```
+```json
+    {
+      "type": "nested-field",
+      "columnName": "shipTo",
+      "outputName": "v1",
+      "expectedType": "STRING",
+      "path": ".phoneNumbers[1].number",
+      "useJqSyntax": true
+    }
+```
+
+```json
+    {
+      "type": "nested-field",
+      "columnName": "shipTo",
+      "outputName": "v2",
+      "expectedType": "STRING",
+      "pathParts": [
+        {
+          "type": "field",
+          "field": "phoneNumbers"
+        },
+        {
+          "type": "arrayElement",
+          "index": 1
+        },
+        {
+          "type": "field",
+          "field": "number"
+        }
+      ]
+    }
+```
+
+|property|description|required?|
+|--------|-----------|---------|
+|columnName|The name of the virtual column.|yes|
+|outputName|The name of the virtual column.|yes|
+|expectedType|The name of the virtual column.|yes|
+|pathParts|The name of the virtual column.|yes|
+|processFromRaw|If set to true, the virtual column will process the "raw" JSON 
data to extract values rather than using an optimized "literal" value selector. 
This option allows extracting non-literal values (such as nested JSON objects 
or arrays) as a `COMPLEX<json>` at the cost of much slower performance.|No, 
default false|
+|path|'JSONPath' or 'jq' syntax path. One of `path` or `pathParts` must be 
set|no, if `pathParts` is defined|
+|useJqSyntax||no, default is false|
+
+#### Nested path part
+|property|description|required?|
+|--------|-----------|---------|
+|type|Must be 'field' or 'arrayElement'|yes|
+|field|The name of the 'field' in a 'field' `type` path part|yes, if `type` is 
'field'|
+|index|The array element index if `type` is `arrayElement`|yes, if `type` is 
'arrayElement'|
+
+This virtual column is used for the SQL operators `JSON_VALUE` (if 
`processFromRaw` is set to false) or `JSON_QUERY`

Review Comment:
   ah true. the helpful part for the intro was more so on how you can use 
JSONPath or jq syntax to extract part of the json



##########
docs/querying/sql-data-types.md:
##########
@@ -112,3 +112,17 @@ When `druid.expressions.useStrictBooleans = false` (the 
default mode), Druid use
 When `druid.expressions.useStrictBooleans = true`, Druid uses three-valued 
logic for
 [expressions](../misc/math-expr.md) evaluation, such as `expression` virtual 
columns or `expression` filters.
 However, even in this mode, Druid uses two-valued logic for filter types other 
than `expression`.
+
+## Nested columns
+Druid supports storing nested data structures in segments using the native 
`COMPLEX<json>` type. This data can be
+interacted with using [JSON functions](sql-json-functions.md), which can 
extract nested values, parse from string,
+serialize to string, and to create new `COMPLEX<json>` structures.

Review Comment:
   ```suggestion
   serialize to string, and create new `COMPLEX<json>` structures.
   ```



##########
docs/querying/sql-data-types.md:
##########
@@ -112,3 +112,17 @@ When `druid.expressions.useStrictBooleans = false` (the 
default mode), Druid use
 When `druid.expressions.useStrictBooleans = true`, Druid uses three-valued 
logic for
 [expressions](../misc/math-expr.md) evaluation, such as `expression` virtual 
columns or `expression` filters.
 However, even in this mode, Druid uses two-valued logic for filter types other 
than `expression`.
+
+## Nested columns
+Druid supports storing nested data structures in segments using the native 
`COMPLEX<json>` type. This data can be
+interacted with using [JSON functions](sql-json-functions.md), which can 
extract nested values, parse from string,

Review Comment:
   ```suggestion
   Druid supports storing nested data structures in segments using the native 
`COMPLEX<json>` type. You can interact
   with this data using [JSON functions](sql-json-functions.md), which can 
extract nested values, parse from string,
   ```



##########
docs/querying/sql-data-types.md:
##########
@@ -112,3 +112,17 @@ When `druid.expressions.useStrictBooleans = false` (the 
default mode), Druid use
 When `druid.expressions.useStrictBooleans = true`, Druid uses three-valued 
logic for
 [expressions](../misc/math-expr.md) evaluation, such as `expression` virtual 
columns or `expression` filters.
 However, even in this mode, Druid uses two-valued logic for filter types other 
than `expression`.
+
+## Nested columns
+Druid supports storing nested data structures in segments using the native 
`COMPLEX<json>` type. This data can be
+interacted with using [JSON functions](sql-json-functions.md), which can 
extract nested values, parse from string,
+serialize to string, and to create new `COMPLEX<json>` structures.
+
+`COMPLEX` types in general currently have limited functionality outside of the 
use of the specialized functions which
+understand them, and so have undefined behavior when:
+* grouping on complex values
+* filtered directly on complex values, e.g. `WHERE json is NULL`

Review Comment:
   ```suggestion
   * filtered directly on complex values, such as `WHERE json is NULL`
   ```



##########
docs/misc/math-expr.md:
##########
@@ -227,6 +226,34 @@ map((x) -> x + 1, x)
 ```
 in this case, the `x` when evaluating `x + 1` is the lambda argument, thus an 
element of the multi-valued column `x`, rather than the column `x` itself.
 
+
+## JSON functions
+JSON functions provide facilities to extract, transform, and create 
`COMPLEX<json>` values. 
+
+| function | description |
+|---|---|
+| json_value(expr, path) | Extract a Druid literal (`STRING`, `LONG`, 
`DOUBLE`) value from `expr` using JSONPath syntax of `path` |
+| json_query(expr, path) | Extract a `COMPLEX<json>` value from `expr` using 
JSONPath syntax of `path` |
+| json_object(expr1, expr2[, expr3, expr4 ...]) | Construct a `COMPLEX<json>` 
with alternating 'key' and 'value' arguments|
+| parse_json(expr) | Deserialize a JSON `STRING` into a `COMPLEX<json>`. If 
the input is not a `STRING` or it is invalid JSON, this function will result in 
an error.|
+| try_parse_json(expr) | Deserialize a JSON `STRING` into a `COMPLEX<json>`. 
If the input is not a `STRING` or it is invalid JSON, this function will result 
in a `NULL` value. |
+| to_json_string(expr) | Convert `expr` into a JSON `STRING` value |
+| json_keys(expr, path) | get array of field names from `expr` at the 
specified JSONPath `path`, or null if the data does not exist or have any 
fields |
+| json_paths(expr) | get array of all JSONPath paths available from `expr` |

Review Comment:
   ```suggestion
   | json_keys(expr, path) | Get array of field names from `expr` at the 
specified JSONPath `path`, or null if the data does not exist or have any 
fields |
   | json_paths(expr) | Get array of all JSONPath paths available from `expr` |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to