This is an automated email from the ASF dual-hosted git repository.
gian pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new da0055d6e37 fix issue with numeric vector selectors on json_value when
least restrictive type contains arrays (#18053)
da0055d6e37 is described below
commit da0055d6e37cd3ce62ce20011a9692db41b3b132
Author: Clint Wylie <[email protected]>
AuthorDate: Fri May 30 13:51:43 2025 -0700
fix issue with numeric vector selectors on json_value when least
restrictive type contains arrays (#18053)
* fix issue with numeric vector selectors on json_value when least
restrictive type contains arrays
* more test
* better method names
* fix test
* fix tests, docs
* oops
---
docs/querying/nested-columns.md | 10 +-
docs/querying/sql-functions.md | 6 +-
docs/querying/sql-json-functions.md | 4 +-
.../druid/msq/exec/MSQComplexGroupByTest.java | 191 +++++++++++----------
.../nested/CompressedNestedDataComplexColumn.java | 35 +++-
.../segment/nested/NestedDataComplexColumn.java | 9 +-
.../segment/virtual/NestedFieldVirtualColumn.java | 32 ++--
.../druid/query/scan/NestedDataScanQueryTest.java | 2 +-
.../nested/NestedDataColumnSupplierTest.java | 16 +-
.../nested/NestedDataColumnSupplierV4Test.java | 10 +-
.../test/resources/nested-all-types-test-data.json | 14 +-
.../java/org/apache/druid/cli/DumpSegment.java | 4 +-
.../druid/sql/calcite/BaseCalciteQueryTest.java | 3 +-
.../sql/calcite/CalciteNestedDataQueryTest.java | 135 ++++++++++++++-
14 files changed, 327 insertions(+), 144 deletions(-)
diff --git a/docs/querying/nested-columns.md b/docs/querying/nested-columns.md
index 81cea824fb7..6073ec77436 100644
--- a/docs/querying/nested-columns.md
+++ b/docs/querying/nested-columns.md
@@ -27,7 +27,7 @@ import TabItem from '@theme/TabItem';
~ under the License.
-->
-Apache Druid supports directly storing nested data structures in
`COMPLEX<json>` columns. `COMPLEX<json>` columns store a copy of the structured
data in JSON format and specialized internal columns and indexes for nested
literal values—STRING, LONG, and DOUBLE types, as well as ARRAY of
STRING, LONG, and DOUBLE values. An optimized [virtual
column](./virtual-columns.md#nested-field-virtual-column) allows Druid to read
and filter these values at speeds consistent with standard Druid [...]
+Apache Druid supports directly storing nested data structures in
`COMPLEX<json>` columns. `COMPLEX<json>` columns store a copy of the structured
data in JSON format and specialized internal columns and indexes for nested
primitive values—STRING, LONG, and DOUBLE types, as well as ARRAY of
STRING, LONG, and DOUBLE values. An optimized [virtual
column](./virtual-columns.md#nested-field-virtual-column) allows Druid to read
and filter these values at speeds consistent with standard Dru [...]
Druid [SQL JSON functions](./sql-json-functions.md) allow you to extract,
transform, and create `COMPLEX<json>` values in SQL queries, using the
specialized virtual columns where appropriate. You can use the [JSON nested
columns functions](math-expr.md#json-functions) in [native
queries](./querying.md) using [expression virtual
columns](./virtual-columns.md#expression-virtual-column), and in native
ingestion with a
[`transformSpec`](../ingestion/ingestion-spec.md#transformspec).
@@ -485,11 +485,11 @@ Example query results:
### Extracting nested data elements
-The `JSON_VALUE` function is specially optimized to provide native Druid level
performance when processing nested literal values, as if they were flattened,
traditional, Druid column types. It does this by reading from the specialized
nested columns and indexes that are built and stored in JSON objects when Druid
creates segments.
+The `JSON_VALUE` function is specially optimized to provide native Druid level
performance when processing nested primitive values, as if they were flattened,
traditional, Druid column types. It does this by reading from the specialized
nested columns and indexes that are built and stored in JSON objects when Druid
creates segments.
Some operations using `JSON_VALUE` run faster than those using native Druid
columns. For example, filtering numeric types uses the indexes built for nested
numeric columns, which are not available for Druid DOUBLE, FLOAT, or LONG
columns.
-`JSON_VALUE` only returns literal types. Any paths that reference JSON objects
or array types return null.
+`JSON_VALUE` only returns primitive types of `STRING`, `LONG`, `DOUBLE`, and
if using `RETURNING` syntax `ARRAY<STRING>`, `ARRAY<LONG>` or `ARRAY<DOUBLE>`.
Any paths that reference JSON objects or array types (if not specifying an
array type in `RETURNING` clause) return null.
:::info
To achieve the best possible performance, use the `JSON_VALUE` function
whenever you query JSON objects.
@@ -586,7 +586,7 @@ These functions are primarily intended for use with
SQL-based ingestion to trans
You can use the `JSON_QUERY` function to extract a partial structure from any
JSON input and return results in a JSON object. Unlike `JSON_VALUE` it can
extract objects and arrays.
-The following example query illustrates the differences in output between
`JSON_VALUE` and `JSON_QUERY`. The two output columns for `JSON_VALUE` contain
null values only because `JSON_VALUE` only returns literal types.
+The following example query illustrates the differences in output between
`JSON_VALUE` and `JSON_QUERY`. The two output columns for `JSON_VALUE` contain
null values only because `JSON_VALUE` only returns primitive types.

@@ -680,7 +680,7 @@ Before you start using the nested columns feature, consider
the following known
- Directly using `COMPLEX<json>` columns and expressions is not well
integrated into the Druid query engine. It can result in errors or undefined
behavior when grouping and filtering, and when you use `COMPLEX<json>` objects
as inputs to aggregators. As a workaround, consider using `TO_JSON_STRING` to
coerce the values to strings before you perform these operations.
- Directly using array-typed outputs from `JSON_KEYS` and `JSON_PATHS` is
moderately supported by the Druid query engine. You can group on these outputs,
and there are a number of array expressions that can operate on these values,
such as `ARRAY_CONCAT_AGG`. However, some operations are not well defined for
use outside array-specific functions, such as filtering using `=` or `IS NULL`.
- Input validation for JSON SQL operators is currently incomplete, which
sometimes results in undefined behavior or unhelpful error messages.
-- Ingesting data with a very complex nested structure is potentially an
expensive operation and may require you to tune ingestion tasks and/or cluster
parameters to account for increased memory usage or overall task run time. When
you tune your ingestion configuration, treat each nested literal field inside
an object as a flattened top-level Druid column.
+- Ingesting data with a very complex nested structure is potentially an
expensive operation and may require you to tune ingestion tasks and/or cluster
parameters to account for increased memory usage or overall task run time. When
you tune your ingestion configuration, treat each nested primitive field inside
an object as a flattened top-level Druid column.
## Further reading
diff --git a/docs/querying/sql-functions.md b/docs/querying/sql-functions.md
index 6fa0c162b03..ce0ce53460f 100644
--- a/docs/querying/sql-functions.md
+++ b/docs/querying/sql-functions.md
@@ -3304,7 +3304,7 @@ Returns the following:
## JSON_PATHS
-Returns an array of all paths which refer to literal values in an expression,
in JSONPath format.
+Returns an array of all paths which refer to primitive values in an
expression, in JSONPath format.
* **Syntax:** `JSON_PATHS(expr)`
* **Function type:** JSON
@@ -3397,11 +3397,11 @@ Returns the following:
## JSON_VALUE
-Extracts a literal value from an expression at a specified path.
+Extracts a primitive value from an expression at a specified path.
If you include `RETURNING` and specify a SQL type (such as `VARCHAR`,
`BIGINT`, `DOUBLE`) the function plans the query using the suggested type.
If `RETURNING` isn't included, the function attempts to infer the type based
on the context.
-If the function can't infer the type, it defaults to `VARCHAR`.
+If the function can't infer the type, it defaults to `VARCHAR`. Primitive
arrays can also be returned, but only if `RETURNING` is specified as an `ARRAY`
type, e.g. `RETURNING VARCHAR ARRAY`.
* **Syntax:** `JSON_VALUE(expr, path [RETURNING sqlType])`
* **Function type:** JSON
diff --git a/docs/querying/sql-json-functions.md
b/docs/querying/sql-json-functions.md
index ea090ae2c28..9d01cec1e44 100644
--- a/docs/querying/sql-json-functions.md
+++ b/docs/querying/sql-json-functions.md
@@ -40,10 +40,10 @@ You can use the following JSON functions to extract,
transform, and create `COMP
|`JSON_KEYS(expr, path)`| Returns an array of field names from `expr` at the
specified `path`.|
|`JSON_MERGE(expr1, expr2[, expr3 ...])`| Merges two or more JSON `STRING` or
`COMPLEX<json>` values into one, preserving the rightmost value when there are
key overlaps. Returns `NULL` if any argument is `NULL`. Always returns a
`COMPLEX<json>` object.|
|`JSON_OBJECT(KEY expr1 VALUE expr2[, KEY expr3 VALUE expr4, ...])` |
Constructs a new `COMPLEX<json>` object from one or more expressions. The `KEY`
expressions must evaluate to string types. The `VALUE` expressions can be
composed of any input type, including other `COMPLEX<json>` objects. The
function can accept colon-separated key-value pairs. The following syntax is
equivalent: `JSON_OBJECT(expr1:expr2[, expr3:expr4, ...])`.|
-|`JSON_PATHS(expr)`| Returns an array of all paths which refer to literal
values in `expr` in JSONPath format. |
+|`JSON_PATHS(expr)`| Returns an array of all paths which refer to primitive
values in `expr` in JSONPath format. |
|`JSON_QUERY(expr, path)`| Extracts a `COMPLEX<json>` value from `expr`, at
the specified `path`. |
|`JSON_QUERY_ARRAY(expr, path)`| Extracts an `ARRAY<COMPLEX<json>>` value from
`expr` at the specified `path`. If the value isn't an `ARRAY`, the function
translates it into a single element `ARRAY` containing the value at `path`.
Mainly used to extract arrays of objects to use as inputs to other [array
functions](./sql-array-functions.md).|
-|`JSON_VALUE(expr, path [RETURNING sqlType])`| Extracts a literal value from
`expr` at the specified `path`. If you include `RETURNING` and specify a SQL
type (such as `VARCHAR`, `BIGINT`, `DOUBLE`) the function plans the query using
the suggested type. If `RETURNING` isn't included, the function attempts to
infer the type based on the context. If the function can't infer the type, it
defaults to `VARCHAR`.|
+|`JSON_VALUE(expr, path [RETURNING sqlType])`| Extracts a primitive value from
`expr` at the specified `path`. If you include `RETURNING` and specify a SQL
type (such as `VARCHAR`, `BIGINT`, `DOUBLE`) the function plans the query using
the suggested type. If `RETURNING` isn't included, the function attempts to
infer the type based on the context. If the function can't infer the type, it
defaults to `VARCHAR`. Primitive arrays can also be returned, but only if
`RETURNING` is specified as [...]
|`PARSE_JSON(expr)`|Parses `expr` into a `COMPLEX<json>` object. This function
deserializes JSON values when processing them, translating stringified JSON
into a nested structure. If the input is invalid JSON or not a `VARCHAR`, it
returns an error.|
|`TRY_PARSE_JSON(expr)`|Parses `expr` into a `COMPLEX<json>` object. This
operator deserializes JSON values when processing them, translating stringified
JSON into a nested structure. If the input is invalid JSON or not a `VARCHAR`,
it returns a `NULL` value.|
|`TO_JSON_STRING(expr)`|Serializes `expr` into a JSON string.|
diff --git
a/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQComplexGroupByTest.java
b/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQComplexGroupByTest.java
index e90c61a6145..db90d4a8a1e 100644
---
a/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQComplexGroupByTest.java
+++
b/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQComplexGroupByTest.java
@@ -19,9 +19,7 @@
package org.apache.druid.msq.exec;
-import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
-import com.google.common.collect.ImmutableSet;
import org.apache.druid.data.input.impl.JsonInputFormat;
import org.apache.druid.data.input.impl.LocalInputSource;
import org.apache.druid.data.input.impl.systemfield.SystemFields;
@@ -49,6 +47,7 @@ import org.apache.druid.query.groupby.GroupByQueryConfig;
import org.apache.druid.query.groupby.orderby.DefaultLimitSpec;
import org.apache.druid.query.groupby.orderby.OrderByColumnSpec;
import org.apache.druid.query.ordering.StringComparators;
+import org.apache.druid.segment.TestHelper;
import org.apache.druid.segment.column.ColumnType;
import org.apache.druid.segment.column.RowSignature;
import org.apache.druid.segment.nested.StructuredData;
@@ -72,7 +71,9 @@ import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
+import java.util.List;
import java.util.Map;
+import java.util.Set;
public class MSQComplexGroupByTest extends MSQTestBase
{
@@ -118,7 +119,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
dataFileSignatureJsonString =
queryFramework().queryJsonMapper().writeValueAsString(dataFileSignature);
dataFileExternalDataSource = new ExternalDataSource(
- new LocalInputSource(null, null, ImmutableList.of(dataFile),
SystemFields.none()),
+ new LocalInputSource(null, null, List.of(dataFile),
SystemFields.none()),
new JsonInputFormat(null, null, null, null, null),
dataFileSignature
);
@@ -143,24 +144,26 @@ public class MSQComplexGroupByTest extends MSQTestBase
+ " GROUP BY 1\n"
+ " PARTITIONED BY ALL")
.setQueryContext(context)
- .setExpectedSegments(ImmutableSet.of(SegmentId.of("foo1",
Intervals.ETERNITY, "test", 0)))
+ .setExpectedSegments(Set.of(SegmentId.of("foo1",
Intervals.ETERNITY, "test", 0)))
.setExpectedDataSource("foo1")
.setExpectedRowSignature(RowSignature.builder()
.add("__time",
ColumnType.LONG)
.add("obj",
ColumnType.NESTED_DATA)
.add("cnt",
ColumnType.LONG)
.build())
- .setExpectedResultRows(ImmutableList.of(
+ .setExpectedResultRows(List.of(
new Object[]{
0L,
StructuredData.wrap(
- ImmutableMap.of(
- "a", 500,
- "b", ImmutableMap.of(
- "x", "e",
- "z", ImmutableList.of(1, 2, 3, 4)
+ Map.of(
+ "a", 600,
+ "b", Map.of(
+ "x", "f",
+ "y", 1.1,
+ "z", List.of(6, 7, 8, 9)
),
- "v", "a"
+ "c", 12.3,
+ "v", "b"
)
),
1L
@@ -168,13 +171,14 @@ public class MSQComplexGroupByTest extends MSQTestBase
new Object[]{
0L,
StructuredData.wrap(
- ImmutableMap.of(
+ Map.of(
"a", 100,
- "b", ImmutableMap.of(
+ "b", Map.of(
"x", "a",
"y", 1.1,
- "z", ImmutableList.of(1, 2, 3, 4)
+ "z", List.of(1, 2, 3, 4)
),
+ "c", 100,
"v", Collections.emptyList()
)
),
@@ -183,13 +187,14 @@ public class MSQComplexGroupByTest extends MSQTestBase
new Object[]{
0L,
StructuredData.wrap(
- ImmutableMap.of(
- "a", 700,
- "b", ImmutableMap.of(
- "x", "g",
+ Map.of(
+ "a", 200,
+ "b", Map.of(
+ "x", "b",
"y", 1.1,
- "z", Arrays.asList(9, null, 9, 9)
+ "z", List.of(2, 4, 6)
),
+ "c", List.of("a", "b"),
"v", Collections.emptyList()
)
),
@@ -198,13 +203,14 @@ public class MSQComplexGroupByTest extends MSQTestBase
new Object[]{
0L,
StructuredData.wrap(
- ImmutableMap.of(
- "a", 200,
- "b", ImmutableMap.of(
- "x", "b",
+ Map.of(
+ "a", 400,
+ "b", Map.of(
+ "x", "d",
"y", 1.1,
- "z", ImmutableList.of(2, 4, 6)
+ "z", List.of(3, 4)
),
+ "c", Map.of("a", 1),
"v", Collections.emptyList()
)
),
@@ -213,14 +219,14 @@ public class MSQComplexGroupByTest extends MSQTestBase
new Object[]{
0L,
StructuredData.wrap(
- ImmutableMap.of(
- "a", 600,
- "b", ImmutableMap.of(
- "x", "f",
- "y", 1.1,
- "z", ImmutableList.of(6, 7, 8, 9)
+ Map.of(
+ "a", 500,
+ "b", Map.of(
+ "x", "e",
+ "z", List.of(1, 2, 3, 4)
),
- "v", "b"
+ "c", "hello",
+ "v", "a"
)
),
1L
@@ -228,13 +234,14 @@ public class MSQComplexGroupByTest extends MSQTestBase
new Object[]{
0L,
StructuredData.wrap(
- ImmutableMap.of(
- "a", 400,
- "b", ImmutableMap.of(
- "x", "d",
+ TestHelper.makeMap(
+ "a", 700,
+ "b", Map.of(
+ "x", "g",
"y", 1.1,
- "z", ImmutableList.of(3, 4)
+ "z", Arrays.asList(9, null, 9, 9)
),
+ "c", null,
"v", Collections.emptyList()
)
),
@@ -242,7 +249,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
},
new Object[]{
0L,
- StructuredData.wrap(ImmutableMap.of("a", 300)),
+ StructuredData.wrap(Map.of("a", 300)),
1L
}
))
@@ -271,7 +278,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
+ " GROUP BY 1\n"
+ " PARTITIONED BY ALL")
.setQueryContext(adjustedContext)
- .setExpectedSegments(ImmutableSet.of(SegmentId.of("foo1",
Intervals.ETERNITY, "test", 0)))
+ .setExpectedSegments(Set.of(SegmentId.of("foo1",
Intervals.ETERNITY, "test", 0)))
.setExpectedDataSource("foo1")
.setExpectedRowSignature(RowSignature.builder()
.add("__time",
ColumnType.LONG)
@@ -279,17 +286,19 @@ public class MSQComplexGroupByTest extends MSQTestBase
.add("cnt",
ColumnType.LONG)
.build())
.addExpectedAggregatorFactory(new
LongSumAggregatorFactory("cnt", "cnt"))
- .setExpectedResultRows(ImmutableList.of(
+ .setExpectedResultRows(List.of(
new Object[]{
0L,
StructuredData.wrap(
- ImmutableMap.of(
- "a", 500,
- "b", ImmutableMap.of(
- "x", "e",
- "z", ImmutableList.of(1, 2, 3, 4)
+ Map.of(
+ "a", 600,
+ "b", Map.of(
+ "x", "f",
+ "y", 1.1,
+ "z", List.of(6, 7, 8, 9)
),
- "v", "a"
+ "c", 12.3,
+ "v", "b"
)
),
1L
@@ -297,13 +306,14 @@ public class MSQComplexGroupByTest extends MSQTestBase
new Object[]{
0L,
StructuredData.wrap(
- ImmutableMap.of(
+ Map.of(
"a", 100,
- "b", ImmutableMap.of(
+ "b", Map.of(
"x", "a",
"y", 1.1,
- "z", ImmutableList.of(1, 2, 3, 4)
+ "z", List.of(1, 2, 3, 4)
),
+ "c", 100,
"v", Collections.emptyList()
)
),
@@ -312,13 +322,14 @@ public class MSQComplexGroupByTest extends MSQTestBase
new Object[]{
0L,
StructuredData.wrap(
- ImmutableMap.of(
- "a", 700,
- "b", ImmutableMap.of(
- "x", "g",
+ Map.of(
+ "a", 200,
+ "b", Map.of(
+ "x", "b",
"y", 1.1,
- "z", Arrays.asList(9, null, 9, 9)
+ "z", List.of(2, 4, 6)
),
+ "c", List.of("a", "b"),
"v", Collections.emptyList()
)
),
@@ -327,13 +338,14 @@ public class MSQComplexGroupByTest extends MSQTestBase
new Object[]{
0L,
StructuredData.wrap(
- ImmutableMap.of(
- "a", 200,
- "b", ImmutableMap.of(
- "x", "b",
+ Map.of(
+ "a", 400,
+ "b", Map.of(
+ "x", "d",
"y", 1.1,
- "z", ImmutableList.of(2, 4, 6)
+ "z", List.of(3, 4)
),
+ "c", Map.of("a", 1),
"v", Collections.emptyList()
)
),
@@ -342,14 +354,14 @@ public class MSQComplexGroupByTest extends MSQTestBase
new Object[]{
0L,
StructuredData.wrap(
- ImmutableMap.of(
- "a", 600,
- "b", ImmutableMap.of(
- "x", "f",
- "y", 1.1,
- "z", ImmutableList.of(6, 7, 8, 9)
+ Map.of(
+ "a", 500,
+ "b", Map.of(
+ "x", "e",
+ "z", List.of(1, 2, 3, 4)
),
- "v", "b"
+ "c", "hello",
+ "v", "a"
)
),
1L
@@ -357,13 +369,14 @@ public class MSQComplexGroupByTest extends MSQTestBase
new Object[]{
0L,
StructuredData.wrap(
- ImmutableMap.of(
- "a", 400,
- "b", ImmutableMap.of(
- "x", "d",
+ TestHelper.makeMap(
+ "a", 700,
+ "b", Map.of(
+ "x", "g",
"y", 1.1,
- "z", ImmutableList.of(3, 4)
+ "z", Arrays.asList(9, null, 9, 9)
),
+ "c", null,
"v", Collections.emptyList()
)
),
@@ -371,7 +384,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
},
new Object[]{
0L,
- StructuredData.wrap(ImmutableMap.of("a", 300)),
+ StructuredData.wrap(Map.of("a", 300)),
1L
}
))
@@ -399,7 +412,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
+ " )\n"
+ " )\n"
+ " ORDER BY 1")
- .setQueryContext(ImmutableMap.of())
+ .setQueryContext(Map.of())
.setExpectedMSQSpec(LegacyMSQSpec
.builder()
.query(newScanQueryBuilder()
@@ -411,7 +424,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
.orderBy(Collections.singletonList(OrderBy.ascending("obj")))
.build()
)
- .columnMappings(new
ColumnMappings(ImmutableList.of(
+ .columnMappings(new
ColumnMappings(List.of(
new ColumnMapping("obj",
"obj")
)))
.tuningConfig(MSQTuningConfig.defaultConfig())
@@ -420,13 +433,13 @@ public class MSQComplexGroupByTest extends MSQTestBase
)
.setExpectedRowSignature(rowSignature)
.setQueryContext(context)
- .setExpectedResultRows(ImmutableList.of(
- new
Object[]{"{\"a\":500,\"b\":{\"x\":\"e\",\"z\":[1,2,3,4]},\"v\":\"a\"}"},
- new
Object[]{"{\"a\":100,\"b\":{\"x\":\"a\",\"y\":1.1,\"z\":[1,2,3,4]},\"v\":[]}"},
- new
Object[]{"{\"a\":700,\"b\":{\"x\":\"g\",\"y\":1.1,\"z\":[9,null,9,9]},\"v\":[]}"},
- new
Object[]{"{\"a\":200,\"b\":{\"x\":\"b\",\"y\":1.1,\"z\":[2,4,6]},\"v\":[]}"},
- new
Object[]{"{\"a\":600,\"b\":{\"x\":\"f\",\"y\":1.1,\"z\":[6,7,8,9]},\"v\":\"b\"}"},
- new
Object[]{"{\"a\":400,\"b\":{\"x\":\"d\",\"y\":1.1,\"z\":[3,4]},\"v\":[]}"},
+ .setExpectedResultRows(List.of(
+ new
Object[]{"{\"a\":600,\"b\":{\"x\":\"f\",\"y\":1.1,\"z\":[6,7,8,9]},\"c\":12.3,\"v\":\"b\"}"},
+ new
Object[]{"{\"a\":100,\"b\":{\"x\":\"a\",\"y\":1.1,\"z\":[1,2,3,4]},\"c\":100,\"v\":[]}"},
+ new
Object[]{"{\"a\":200,\"b\":{\"x\":\"b\",\"y\":1.1,\"z\":[2,4,6]},\"c\":[\"a\",\"b\"],\"v\":[]}"},
+ new
Object[]{"{\"a\":400,\"b\":{\"x\":\"d\",\"y\":1.1,\"z\":[3,4]},\"c\":{\"a\":1},\"v\":[]}"},
+ new
Object[]{"{\"a\":500,\"b\":{\"x\":\"e\",\"z\":[1,2,3,4]},\"c\":\"hello\",\"v\":\"a\"}"},
+ new
Object[]{"{\"a\":700,\"b\":{\"x\":\"g\",\"y\":1.1,\"z\":[9,null,9,9]},\"c\":null,\"v\":[]}"},
new Object[]{"{\"a\":300}"}
))
.verifyResults();
@@ -457,7 +470,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
+ " )\n"
+ " )\n"
+ " ORDER BY 1")
-
.setQueryContext(ImmutableMap.of(PlannerConfig.CTX_KEY_USE_APPROXIMATE_COUNT_DISTINCT,
false))
+
.setQueryContext(Map.of(PlannerConfig.CTX_KEY_USE_APPROXIMATE_COUNT_DISTINCT,
false))
.setExpectedMSQSpec(
LegacyMSQSpec
.builder()
@@ -488,7 +501,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
.setQuerySegmentSpec(querySegmentSpec(Intervals.ETERNITY))
.setGranularity(Granularities.ALL)
.setLimitSpec(new DefaultLimitSpec(
- ImmutableList.of(
+ List.of(
new OrderByColumnSpec(
"a0",
OrderByColumnSpec.Direction.ASCENDING,
@@ -500,7 +513,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
.setContext(modifiedContext)
.build()
)
- .columnMappings(new
ColumnMappings(ImmutableList.of(
+ .columnMappings(new ColumnMappings(List.of(
new ColumnMapping("a0", "distinct_obj")
)))
.tuningConfig(MSQTuningConfig.defaultConfig())
@@ -509,7 +522,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
)
.setExpectedRowSignature(rowSignature)
.setQueryContext(modifiedContext)
- .setExpectedResultRows(ImmutableList.of(
+ .setExpectedResultRows(Collections.singletonList(
new Object[]{7L}
))
.verifyResults();
@@ -524,7 +537,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
.add("cObj",
ColumnType.NESTED_DATA)
.build();
DataSource dataFileExternalDataSource2 = new ExternalDataSource(
- new LocalInputSource(null, null, ImmutableList.of(dataFile),
SystemFields.none()),
+ new LocalInputSource(null, null, List.of(dataFile),
SystemFields.none()),
new JsonInputFormat(null, null, null, null, null),
dataFileSignature
);
@@ -549,7 +562,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
+ " )\n"
+ " )\n"
+ " ORDER BY 1")
-
.setQueryContext(ImmutableMap.of(PlannerConfig.CTX_KEY_USE_APPROXIMATE_COUNT_DISTINCT,
false))
+
.setQueryContext(Map.of(PlannerConfig.CTX_KEY_USE_APPROXIMATE_COUNT_DISTINCT,
false))
.setExpectedMSQSpec(
LegacyMSQSpec
.builder()
@@ -580,7 +593,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
.setQuerySegmentSpec(querySegmentSpec(Intervals.ETERNITY))
.setGranularity(Granularities.ALL)
.setLimitSpec(new DefaultLimitSpec(
- ImmutableList.of(
+ List.of(
new OrderByColumnSpec(
"a0",
OrderByColumnSpec.Direction.ASCENDING,
@@ -592,7 +605,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
.setContext(modifiedContext)
.build()
)
- .columnMappings(new
ColumnMappings(ImmutableList.of(
+ .columnMappings(new ColumnMappings(List.of(
new ColumnMapping("a0", "distinct_obj")
)))
.tuningConfig(MSQTuningConfig.defaultConfig())
@@ -601,7 +614,7 @@ public class MSQComplexGroupByTest extends MSQTestBase
)
.setExpectedRowSignature(rowSignature)
.setQueryContext(modifiedContext)
- .setExpectedResultRows(ImmutableList.of(
+ .setExpectedResultRows(Collections.singletonList(
new Object[]{1L}
))
.verifyResults();
diff --git
a/processing/src/main/java/org/apache/druid/segment/nested/CompressedNestedDataComplexColumn.java
b/processing/src/main/java/org/apache/druid/segment/nested/CompressedNestedDataComplexColumn.java
index 5913425cc1a..472c7a21a7c 100644
---
a/processing/src/main/java/org/apache/druid/segment/nested/CompressedNestedDataComplexColumn.java
+++
b/processing/src/main/java/org/apache/druid/segment/nested/CompressedNestedDataComplexColumn.java
@@ -844,10 +844,11 @@ public abstract class
CompressedNestedDataComplexColumn<TStringDictionary extend
@Nullable
@Override
- public Set<ColumnType> getColumnTypes(List<NestedPathPart> path)
+ public Set<ColumnType> getFieldTypes(List<NestedPathPart> path)
{
String field = getField(path);
int index = fields.indexOf(field);
+ // if index is negative, check for an array element accessor in the path
if (index < 0) {
if (!path.isEmpty() && path.get(path.size() - 1) instanceof
NestedPathArrayElement) {
final String arrayField = getField(path.subList(0, path.size() - 1));
@@ -856,9 +857,9 @@ public abstract class
CompressedNestedDataComplexColumn<TStringDictionary extend
if (index < 0) {
return null;
}
- Set<ColumnType> arrayTypes =
FieldTypeInfo.convertToSet(fieldInfo.getTypes(index).getByteValue());
- Set<ColumnType> elementTypes =
Sets.newHashSetWithExpectedSize(arrayTypes.size());
- for (ColumnType type : arrayTypes) {
+ final Set<ColumnType> arrayFieldTypes =
FieldTypeInfo.convertToSet(fieldInfo.getTypes(index).getByteValue());
+ final Set<ColumnType> elementTypes =
Sets.newHashSetWithExpectedSize(arrayFieldTypes.size());
+ for (ColumnType type : arrayFieldTypes) {
if (type.isArray()) {
elementTypes.add((ColumnType) type.getElementType());
} else {
@@ -870,6 +871,32 @@ public abstract class
CompressedNestedDataComplexColumn<TStringDictionary extend
return
FieldTypeInfo.convertToSet(fieldInfo.getTypes(index).getByteValue());
}
+ @Nullable
+ @Override
+ public ColumnType getFieldLogicalType(List<NestedPathPart> path)
+ {
+ final String field = getField(path);
+ final Set<ColumnType> fieldTypes;
+ int index = fields.indexOf(field);
+ if (index < 0) {
+ if (!path.isEmpty() && path.get(path.size() - 1) instanceof
NestedPathArrayElement) {
+ final String arrayField = getField(path.subList(0, path.size() - 1));
+ index = fields.indexOf(arrayField);
+ }
+ if (index < 0) {
+ return null;
+ }
+ fieldTypes =
FieldTypeInfo.convertToSet(fieldInfo.getTypes(index).getByteValue());
+ } else {
+ fieldTypes =
FieldTypeInfo.convertToSet(fieldInfo.getTypes(index).getByteValue());
+ }
+ ColumnType leastRestrictiveType = null;
+ for (ColumnType type : fieldTypes) {
+ leastRestrictiveType =
ColumnType.leastRestrictiveType(leastRestrictiveType, type);
+ }
+ return leastRestrictiveType;
+ }
+
@Nullable
@Override
public ColumnHolder getColumnHolder(List<NestedPathPart> path)
diff --git
a/processing/src/main/java/org/apache/druid/segment/nested/NestedDataComplexColumn.java
b/processing/src/main/java/org/apache/druid/segment/nested/NestedDataComplexColumn.java
index 18e4c81f858..816bb27ad14 100644
---
a/processing/src/main/java/org/apache/druid/segment/nested/NestedDataComplexColumn.java
+++
b/processing/src/main/java/org/apache/druid/segment/nested/NestedDataComplexColumn.java
@@ -95,7 +95,14 @@ public abstract class NestedDataComplexColumn implements
ComplexColumn
* Get all {@link ColumnType} for the nested field column
*/
@Nullable
- public abstract Set<ColumnType> getColumnTypes(List<NestedPathPart> path);
+ public abstract Set<ColumnType> getFieldTypes(List<NestedPathPart> path);
+
+ /**
+ * Reduces {@link #getFieldTypes(List)} for the nested field column using
+ * {@link ColumnType#leastRestrictiveType(ColumnType, ColumnType)}
+ */
+ @Nullable
+ public abstract ColumnType getFieldLogicalType(List<NestedPathPart> path);
/**
* Get a {@link ColumnHolder} for a nested field column to retrieve
metadata, the column itself, or indexes.
diff --git
a/processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java
b/processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java
index 663b272917b..f222990fe2e 100644
---
a/processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java
+++
b/processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java
@@ -518,13 +518,7 @@ public class NestedFieldVirtualColumn implements
VirtualColumn
// is JSON_VALUE which only returns literals, so we can use the nested
columns value selector
return new
RawFieldVectorObjectSelector(complexColumn.makeVectorObjectSelector(offset),
fieldSpec.parts);
}
- Set<ColumnType> types = complexColumn.getColumnTypes(fieldSpec.parts);
- ColumnType leastRestrictiveType = null;
- if (types != null) {
- for (ColumnType type : types) {
- leastRestrictiveType =
ColumnType.leastRestrictiveType(leastRestrictiveType, type);
- }
- }
+ final ColumnType leastRestrictiveType =
complexColumn.getFieldLogicalType(fieldSpec.parts);
if (leastRestrictiveType != null && leastRestrictiveType.isNumeric() &&
!Types.isNumeric(fieldSpec.expectedType)) {
return ExpressionVectorSelectors.castValueSelectorToObject(
offset,
@@ -651,8 +645,11 @@ public class NestedFieldVirtualColumn implements
VirtualColumn
}
BaseColumn theColumn = holder.getColumn();
if (!(theColumn instanceof NestedDataComplexColumn)) {
-
+ // not a nested column, but we can still try to coerce the values to the
expected type of value selector if the
+ // path is the root path
if (fieldSpec.parts.isEmpty()) {
+ // coerce string columns (a bit presumptuous in general, but in
practice these are going to be string columns
+ // ... revisit this if that ever changes)
if (theColumn instanceof DictionaryEncodedColumn) {
final VectorObjectSelector delegate =
theColumn.makeVectorObjectSelector(offset);
if (fieldSpec.expectedType != null &&
fieldSpec.expectedType.is(ValueType.LONG)) {
@@ -791,8 +788,10 @@ public class NestedFieldVirtualColumn implements
VirtualColumn
};
}
}
+ // otherwise, just use the columns native vector value selector (this
might explode if not natively numeric)
return theColumn.makeVectorValueSelector(offset);
}
+ // array columns can also be handled if the path is a root level array
element accessor
if (fieldSpec.parts.size() == 1 && fieldSpec.parts.get(0) instanceof
NestedPathArrayElement && theColumn instanceof VariantColumn) {
final VariantColumn<?> arrayColumn = (VariantColumn<?>) theColumn;
VectorObjectSelector arraySelector =
arrayColumn.makeVectorObjectSelector(offset);
@@ -1018,7 +1017,20 @@ public class NestedFieldVirtualColumn implements
VirtualColumn
return column.makeVectorValueSelector(fieldSpec.parts, offset);
}
- final VectorObjectSelector objectSelector =
column.makeVectorObjectSelector(fieldSpec.parts, offset);
+ final ColumnType leastRestrictiveType =
column.getFieldLogicalType(fieldSpec.parts);
+ final VectorObjectSelector fieldSelector =
column.makeVectorObjectSelector(fieldSpec.parts, offset);
+ final VectorObjectSelector objectSelector;
+ // if the field has array types, wrap the object selector in the array to
scalar value coercer
+ if (leastRestrictiveType != null && leastRestrictiveType.isArray()) {
+ objectSelector = makeVectorArrayToScalarObjectSelector(
+ offset,
+ fieldSelector,
+
ExpressionType.fromColumnTypeStrict(leastRestrictiveType.getElementType()),
+ ExpressionType.fromColumnTypeStrict(fieldSpec.expectedType)
+ );
+ } else {
+ objectSelector = fieldSelector;
+ }
if (fieldSpec.expectedType.is(ValueType.LONG)) {
return new BaseLongVectorValueSelector(offset)
{
@@ -1175,7 +1187,7 @@ public class NestedFieldVirtualColumn implements
VirtualColumn
return NoIndexesColumnIndexSupplier.getInstance();
}
if (fieldSpec.expectedType != null) {
- final Set<ColumnType> types =
nestedColumn.getColumnTypes(fieldSpec.parts);
+ final Set<ColumnType> types =
nestedColumn.getFieldTypes(fieldSpec.parts);
// if the expected output type is numeric but not all of the input
types are numeric, we might have additional
// null values than what the null value bitmap is tracking, fall back
to not using indexes
if (fieldSpec.expectedType.isNumeric() && (types == null ||
types.stream().anyMatch(t -> !t.isNumeric()))) {
diff --git
a/processing/src/test/java/org/apache/druid/query/scan/NestedDataScanQueryTest.java
b/processing/src/test/java/org/apache/druid/query/scan/NestedDataScanQueryTest.java
index 9b3e644fb7b..1ef233237ed 100644
---
a/processing/src/test/java/org/apache/druid/query/scan/NestedDataScanQueryTest.java
+++
b/processing/src/test/java/org/apache/druid/query/scan/NestedDataScanQueryTest.java
@@ -780,7 +780,7 @@ public class NestedDataScanQueryTest extends
InitializedNullHandlingTest
Assert.assertEquals(1, resultsRealtime.size());
Assert.assertEquals(resultsRealtime.size(), resultsSegments.size());
Assert.assertEquals(
- "[[1672531200000, null, null, null, 1, 51, -0.13, 1, [], [51, -35],
{a=700, b={x=g, y=1.1, z=[9, null, 9, 9]}, v=[]}, {x=400, y=[{l=[null], m=100,
n=5}, {l=[a, b, c], m=a, n=1}], z={}}, null, [a, b], null, [2, 3], null,
[null], null, [1, 0, 1], null, [{x=1}, {x=2}], null, hello, 1234, 1.234, {x=1,
y=hello, z={a=1.1, b=1234, c=[a, b, c], d=[]}}, [a, b, c], [1, 2, 3], [1.1,
2.2, 3.3], [], {}, [null, null], [{}, {}, {}], [{a=b, x=1, y=1.3}], 1],
[1672531200000, , 2, null, 0, b, 1.1, [...]
+ "[[1672531200000, null, null, null, 1, 51, -0.13, 1, [], [51, -35],
{a=700, b={x=g, y=1.1, z=[9, null, 9, 9]}, c=null, v=[]}, {x=400, y=[{l=[null],
m=100, n=5}, {l=[a, b, c], m=a, n=1}], z={}}, null, [a, b], null, [2, 3], null,
[null], null, [1, 0, 1], null, [{x=1}, {x=2}], null, hello, 1234, 1.234, {x=1,
y=hello, z={a=1.1, b=1234, c=[a, b, c], d=[]}}, [a, b, c], [1, 2, 3], [1.1,
2.2, 3.3], [], {}, [null, null], [{}, {}, {}], [{a=b, x=1, y=1.3}], 1],
[1672531200000, , 2, null, 0, [...]
resultsSegments.get(0).getEvents().toString()
);
Assert.assertEquals(resultsSegments.get(0).getEvents().toString(),
resultsRealtime.get(0).getEvents().toString());
diff --git
a/processing/src/test/java/org/apache/druid/segment/nested/NestedDataColumnSupplierTest.java
b/processing/src/test/java/org/apache/druid/segment/nested/NestedDataColumnSupplierTest.java
index 991456aa9c5..dfc68aaf8bb 100644
---
a/processing/src/test/java/org/apache/druid/segment/nested/NestedDataColumnSupplierTest.java
+++
b/processing/src/test/java/org/apache/druid/segment/nested/NestedDataColumnSupplierTest.java
@@ -335,7 +335,7 @@ public class NestedDataColumnSupplierTest extends
InitializedNullHandlingTest
ColumnValueSelector<?> rawSelector =
column.makeColumnValueSelector(offset);
final List<NestedPathPart> xPath = NestedPathFinder.parseJsonPath("$.x");
- Assert.assertEquals(ImmutableSet.of(ColumnType.LONG),
column.getColumnTypes(xPath));
+ Assert.assertEquals(ImmutableSet.of(ColumnType.LONG),
column.getFieldTypes(xPath));
Assert.assertEquals(ColumnType.LONG,
column.getColumnHolder(xPath).getCapabilities().toColumnType());
ColumnValueSelector<?> xSelector = column.makeColumnValueSelector(xPath,
offset);
DimensionSelector xDimSelector = column.makeDimensionSelector(xPath,
offset, null);
@@ -346,7 +346,7 @@ public class NestedDataColumnSupplierTest extends
InitializedNullHandlingTest
NullValueIndex xNulls = xIndexSupplier.as(NullValueIndex.class);
final List<NestedPathPart> yPath = NestedPathFinder.parseJsonPath("$.y");
- Assert.assertEquals(ImmutableSet.of(ColumnType.DOUBLE),
column.getColumnTypes(yPath));
+ Assert.assertEquals(ImmutableSet.of(ColumnType.DOUBLE),
column.getFieldTypes(yPath));
Assert.assertEquals(ColumnType.DOUBLE,
column.getColumnHolder(yPath).getCapabilities().toColumnType());
ColumnValueSelector<?> ySelector = column.makeColumnValueSelector(yPath,
offset);
DimensionSelector yDimSelector = column.makeDimensionSelector(yPath,
offset, null);
@@ -357,7 +357,7 @@ public class NestedDataColumnSupplierTest extends
InitializedNullHandlingTest
NullValueIndex yNulls = yIndexSupplier.as(NullValueIndex.class);
final List<NestedPathPart> zPath = NestedPathFinder.parseJsonPath("$.z");
- Assert.assertEquals(ImmutableSet.of(ColumnType.STRING),
column.getColumnTypes(zPath));
+ Assert.assertEquals(ImmutableSet.of(ColumnType.STRING),
column.getFieldTypes(zPath));
Assert.assertEquals(ColumnType.STRING,
column.getColumnHolder(zPath).getCapabilities().toColumnType());
ColumnValueSelector<?> zSelector = column.makeColumnValueSelector(zPath,
offset);
DimensionSelector zDimSelector = column.makeDimensionSelector(zPath,
offset, null);
@@ -370,7 +370,7 @@ public class NestedDataColumnSupplierTest extends
InitializedNullHandlingTest
final List<NestedPathPart> vPath = NestedPathFinder.parseJsonPath("$.v");
Assert.assertEquals(
ImmutableSet.of(ColumnType.STRING, ColumnType.LONG, ColumnType.DOUBLE),
- column.getColumnTypes(vPath)
+ column.getFieldTypes(vPath)
);
Assert.assertEquals(ColumnType.STRING,
column.getColumnHolder(vPath).getCapabilities().toColumnType());
ColumnValueSelector<?> vSelector = column.makeColumnValueSelector(vPath,
offset);
@@ -382,7 +382,7 @@ public class NestedDataColumnSupplierTest extends
InitializedNullHandlingTest
NullValueIndex vNulls = vIndexSupplier.as(NullValueIndex.class);
final List<NestedPathPart> nullishPath =
NestedPathFinder.parseJsonPath("$.nullish");
- Assert.assertEquals(ImmutableSet.of(ColumnType.STRING),
column.getColumnTypes(nullishPath));
+ Assert.assertEquals(ImmutableSet.of(ColumnType.STRING),
column.getFieldTypes(nullishPath));
Assert.assertEquals(ColumnType.STRING,
column.getColumnHolder(nullishPath).getCapabilities().toColumnType());
ColumnValueSelector<?> nullishSelector =
column.makeColumnValueSelector(nullishPath, offset);
DimensionSelector nullishDimSelector =
column.makeDimensionSelector(nullishPath, offset, null);
@@ -443,7 +443,7 @@ public class NestedDataColumnSupplierTest extends
InitializedNullHandlingTest
VectorObjectSelector rawVectorSelectorFiltered =
column.makeVectorObjectSelector(bitmapVectorOffset);
final List<NestedPathPart> sPath = NestedPathFinder.parseJsonPath("$.s");
- Assert.assertEquals(ImmutableSet.of(ColumnType.STRING_ARRAY),
column.getColumnTypes(sPath));
+ Assert.assertEquals(ImmutableSet.of(ColumnType.STRING_ARRAY),
column.getFieldTypes(sPath));
Assert.assertEquals(ColumnType.STRING_ARRAY,
column.getColumnHolder(sPath).getCapabilities().toColumnType());
ColumnValueSelector<?> sSelector = column.makeColumnValueSelector(sPath,
offset);
VectorObjectSelector sVectorSelector =
column.makeVectorObjectSelector(sPath, vectorOffset);
@@ -468,7 +468,7 @@ public class NestedDataColumnSupplierTest extends
InitializedNullHandlingTest
Assert.assertNull(sElementIndexSupplier.as(NullValueIndex.class));
final List<NestedPathPart> lPath = NestedPathFinder.parseJsonPath("$.l");
- Assert.assertEquals(ImmutableSet.of(ColumnType.LONG_ARRAY),
column.getColumnTypes(lPath));
+ Assert.assertEquals(ImmutableSet.of(ColumnType.LONG_ARRAY),
column.getFieldTypes(lPath));
Assert.assertEquals(ColumnType.LONG_ARRAY,
column.getColumnHolder(lPath).getCapabilities().toColumnType());
ColumnValueSelector<?> lSelector = column.makeColumnValueSelector(lPath,
offset);
VectorObjectSelector lVectorSelector =
column.makeVectorObjectSelector(lPath, vectorOffset);
@@ -494,7 +494,7 @@ public class NestedDataColumnSupplierTest extends
InitializedNullHandlingTest
Assert.assertNull(lElementIndexSupplier.as(NullValueIndex.class));
final List<NestedPathPart> dPath = NestedPathFinder.parseJsonPath("$.d");
- Assert.assertEquals(ImmutableSet.of(ColumnType.DOUBLE_ARRAY),
column.getColumnTypes(dPath));
+ Assert.assertEquals(ImmutableSet.of(ColumnType.DOUBLE_ARRAY),
column.getFieldTypes(dPath));
Assert.assertEquals(ColumnType.DOUBLE_ARRAY,
column.getColumnHolder(dPath).getCapabilities().toColumnType());
ColumnValueSelector<?> dSelector = column.makeColumnValueSelector(dPath,
offset);
VectorObjectSelector dVectorSelector =
column.makeVectorObjectSelector(dPath, vectorOffset);
diff --git
a/processing/src/test/java/org/apache/druid/segment/nested/NestedDataColumnSupplierV4Test.java
b/processing/src/test/java/org/apache/druid/segment/nested/NestedDataColumnSupplierV4Test.java
index 6e27bf49fbf..feea28374a4 100644
---
a/processing/src/test/java/org/apache/druid/segment/nested/NestedDataColumnSupplierV4Test.java
+++
b/processing/src/test/java/org/apache/druid/segment/nested/NestedDataColumnSupplierV4Test.java
@@ -200,7 +200,7 @@ public class NestedDataColumnSupplierV4Test extends
InitializedNullHandlingTest
SimpleAscendingOffset offset = new SimpleAscendingOffset(data.size());
ColumnValueSelector<?> rawSelector =
column.makeColumnValueSelector(offset);
final List<NestedPathPart> xPath = NestedPathFinder.parseJsonPath("$.x");
- Assert.assertEquals(ImmutableSet.of(ColumnType.LONG),
column.getColumnTypes(xPath));
+ Assert.assertEquals(ImmutableSet.of(ColumnType.LONG),
column.getFieldTypes(xPath));
Assert.assertEquals(ColumnType.LONG,
column.getColumnHolder(xPath).getCapabilities().toColumnType());
ColumnValueSelector<?> xSelector = column.makeColumnValueSelector(xPath,
offset);
DimensionSelector xDimSelector = column.makeDimensionSelector(xPath,
offset, null);
@@ -210,7 +210,7 @@ public class NestedDataColumnSupplierV4Test extends
InitializedNullHandlingTest
DruidPredicateIndexes xPredicateIndex =
xIndexSupplier.as(DruidPredicateIndexes.class);
NullValueIndex xNulls = xIndexSupplier.as(NullValueIndex.class);
final List<NestedPathPart> yPath = NestedPathFinder.parseJsonPath("$.y");
- Assert.assertEquals(ImmutableSet.of(ColumnType.DOUBLE),
column.getColumnTypes(yPath));
+ Assert.assertEquals(ImmutableSet.of(ColumnType.DOUBLE),
column.getFieldTypes(yPath));
Assert.assertEquals(ColumnType.DOUBLE,
column.getColumnHolder(yPath).getCapabilities().toColumnType());
ColumnValueSelector<?> ySelector = column.makeColumnValueSelector(yPath,
offset);
DimensionSelector yDimSelector = column.makeDimensionSelector(yPath,
offset, null);
@@ -220,7 +220,7 @@ public class NestedDataColumnSupplierV4Test extends
InitializedNullHandlingTest
DruidPredicateIndexes yPredicateIndex =
yIndexSupplier.as(DruidPredicateIndexes.class);
NullValueIndex yNulls = yIndexSupplier.as(NullValueIndex.class);
final List<NestedPathPart> zPath = NestedPathFinder.parseJsonPath("$.z");
- Assert.assertEquals(ImmutableSet.of(ColumnType.STRING),
column.getColumnTypes(zPath));
+ Assert.assertEquals(ImmutableSet.of(ColumnType.STRING),
column.getFieldTypes(zPath));
Assert.assertEquals(ColumnType.STRING,
column.getColumnHolder(zPath).getCapabilities().toColumnType());
ColumnValueSelector<?> zSelector = column.makeColumnValueSelector(zPath,
offset);
DimensionSelector zDimSelector = column.makeDimensionSelector(zPath,
offset, null);
@@ -232,7 +232,7 @@ public class NestedDataColumnSupplierV4Test extends
InitializedNullHandlingTest
final List<NestedPathPart> vPath = NestedPathFinder.parseJsonPath("$.v");
Assert.assertEquals(
ImmutableSet.of(ColumnType.STRING, ColumnType.LONG, ColumnType.DOUBLE),
- column.getColumnTypes(vPath)
+ column.getFieldTypes(vPath)
);
Assert.assertEquals(ColumnType.STRING,
column.getColumnHolder(vPath).getCapabilities().toColumnType());
ColumnValueSelector<?> vSelector = column.makeColumnValueSelector(vPath,
offset);
@@ -243,7 +243,7 @@ public class NestedDataColumnSupplierV4Test extends
InitializedNullHandlingTest
DruidPredicateIndexes vPredicateIndex =
vIndexSupplier.as(DruidPredicateIndexes.class);
NullValueIndex vNulls = vIndexSupplier.as(NullValueIndex.class);
final List<NestedPathPart> nullishPath =
NestedPathFinder.parseJsonPath("$.nullish");
- Assert.assertEquals(ImmutableSet.of(ColumnType.STRING),
column.getColumnTypes(nullishPath));
+ Assert.assertEquals(ImmutableSet.of(ColumnType.STRING),
column.getFieldTypes(nullishPath));
Assert.assertEquals(ColumnType.STRING,
column.getColumnHolder(nullishPath).getCapabilities().toColumnType());
ColumnValueSelector<?> nullishSelector =
column.makeColumnValueSelector(nullishPath, offset);
DimensionSelector nullishDimSelector =
column.makeDimensionSelector(nullishPath, offset, null);
diff --git a/processing/src/test/resources/nested-all-types-test-data.json
b/processing/src/test/resources/nested-all-types-test-data.json
index 8aa221062c7..5572c72799b 100644
--- a/processing/src/test/resources/nested-all-types-test-data.json
+++ b/processing/src/test/resources/nested-all-types-test-data.json
@@ -1,7 +1,7 @@
-{"timestamp": "2023-01-01T00:00:00", "str":"a", "long":1, "double":1.0,
"bool": true, "variant": 1, "variantNumeric": 1,
"variantEmptyObj":1, "variantEmtpyArray":1, "variantWithArrays": 1,
"obj":{"a": 100, "b": {"x": "a", "y": 1.1, "z": [1, 2, 3, 4]}, "v": []},
"complexObj":{"x": 1234, "y": [{"l": ["a", "b", "c"], "m": "a", "n": 1},{"l":
["a", "b", "c"], "m": "a", "n": 1}], "z": {"a": [1.1, 2.2, 3.3], "b": true}},
"arrayString": ["a", "b"], "ar [...]
-{"timestamp": "2023-01-01T00:00:00", "str":"", "long":2,
"bool": false, "variant": "b", "variantNumeric": 1.1,
"variantEmptyObj":"b", "variantEmtpyArray":2, "variantWithArrays": "b",
"obj":{"a": 200, "b": {"x": "b", "y": 1.1, "z": [2, 4, 6]}, "v": []},
"complexObj":{"x": 10, "y": [{"l": ["b", "b", "c"], "m": "b", "n": 2}, [1, 2,
3]], "z": {"a": [5.5], "b": false}},
"arrayString": ["a", "b", "c"], "ar [...]
-{"timestamp": "2023-01-01T00:00:00", "str":"null", "long":3, "double":2.0,
"variant": 3.0, "variantNumeric": 1.0,
"variantEmptyObj":3.3, "variantEmtpyArray":3, "variantWithArrays": 3.0,
"obj":{"a": 300},
"complexObj":{"x": 4.4, "y": [{"l": [], "m": 100, "n": 3},{"l": ["a"]}, {"l":
["b"], "n": []}], "z": {"a": [], "b": true}},
"arrayString": ["b", "c"], "ar [...]
-{"timestamp": "2023-01-01T00:00:00", "str":"b", "long":4, "double":3.3,
"bool": true, "variant": "1",
"variantEmptyObj":{}, "variantEmtpyArray":4, "variantWithArrays": "1",
"obj":{"a": 400, "b": {"x": "d", "y": 1.1, "z": [3, 4]}, "v": []},
"complexObj":{"x": 1234,
"z": {"a": [1.1, 2.2, 3.3], "b": true}},
"arrayString": ["d", "e"], "ar [...]
-{"timestamp": "2023-01-01T00:00:00", "str":"c", "long": null, "double":4.4,
"bool": true, "variant": "hello", "variantNumeric": -1000,
"variantEmptyObj":{}, "variantEmtpyArray":[], "variantWithArrays": "hello",
"obj":{"a": 500, "b": {"x": "e", "z": [1, 2, 3, 4]}, "v": "a"},
"complexObj":{"x": 11, "y": [],
"z": {"a": [null], "b": false}},
"arrayString": null, [...]
-{"timestamp": "2023-01-01T00:00:00", "str":"d", "long":5, "double":5.9,
"bool": false, "variantNumeric": 3.33,
"variantEmptyObj":"a", "variantEmtpyArray":6,
"obj":{"a": 600, "b": {"x": "f", "y": 1.1, "z": [6, 7, 8, 9]}, "v": "b"},
"arrayString": ["a", "b"], "ar [...]
-{"timestamp": "2023-01-01T00:00:00", "str":null,
"double":null, "bool": true, "variant": 51, "variantNumeric": -0.13,
"variantEmptyObj":1, "variantEmtpyArray":[], "variantWithArrays": [51, -35],
"obj":{"a": 700, "b": {"x": "g", "y": 1.1, "z": [9, null, 9, 9]}, "v": []},
"complexObj":{"x": 400, "y": [{"l": [null], "m": 100, "n": 5},{"l": ["a", "b",
"c"], "m": "a", "n": 1}], "z": {}},
"ar [...]
+{"timestamp": "2023-01-01T00:00:00", "str":"a", "long":1, "double":1.0,
"bool": true, "variant": 1, "variantNumeric": 1,
"variantEmptyObj":1, "variantEmtpyArray":1, "variantWithArrays": 1,
"obj":{"a": 100, "b": {"x": "a", "y": 1.1, "z": [1, 2, 3, 4]}, "c": 100, "v":
[]}, "complexObj":{"x": 1234, "y": [{"l": ["a", "b", "c"], "m": "a", "n":
1},{"l": ["a", "b", "c"], "m": "a", "n": 1}], "z": {"a": [1.1, 2.2, 3.3], "b":
true}}, "arrayString": ["a", "b"], [...]
+{"timestamp": "2023-01-01T00:00:00", "str":"", "long":2,
"bool": false, "variant": "b", "variantNumeric": 1.1,
"variantEmptyObj":"b", "variantEmtpyArray":2, "variantWithArrays": "b",
"obj":{"a": 200, "b": {"x": "b", "y": 1.1, "z": [2, 4, 6]}, "c": ["a", "b"],
"v": []}, "complexObj":{"x": 10, "y": [{"l": ["b", "b", "c"], "m": "b", "n":
2}, [1, 2, 3]], "z": {"a": [5.5], "b": false}},
"arrayString": ["a", "b", "c"] [...]
+{"timestamp": "2023-01-01T00:00:00", "str":"null", "long":3, "double":2.0,
"variant": 3.0, "variantNumeric": 1.0,
"variantEmptyObj":3.3, "variantEmtpyArray":3, "variantWithArrays": 3.0,
"obj":{"a": 300},
"complexObj":{"x": 4.4, "y": [{"l": [], "m": 100, "n": 3},{"l": ["a"]},
{"l": ["b"], "n": []}], "z": {"a": [], "b": true}},
"arrayString": ["b", "c"], [...]
+{"timestamp": "2023-01-01T00:00:00", "str":"b", "long":4, "double":3.3,
"bool": true, "variant": "1",
"variantEmptyObj":{}, "variantEmtpyArray":4, "variantWithArrays": "1",
"obj":{"a": 400, "b": {"x": "d", "y": 1.1, "z": [3, 4]}, "c": {"a": 1},"v":
[]}, "complexObj":{"x": 1234, "z": {"a": [1.1, 2.2, 3.3], "b": true}},
"arrayString": ["d", "e"], [...]
+{"timestamp": "2023-01-01T00:00:00", "str":"c", "long": null, "double":4.4,
"bool": true, "variant": "hello", "variantNumeric": -1000,
"variantEmptyObj":{}, "variantEmtpyArray":[], "variantWithArrays": "hello",
"obj":{"a": 500, "b": {"x": "e", "z": [1, 2, 3, 4]}, "c": "hello","v": "a"},
"complexObj":{"x": 11, "y": [], "z": {"a": [null], "b": false}},
"arrayString": null, [...]
+{"timestamp": "2023-01-01T00:00:00", "str":"d", "long":5, "double":5.9,
"bool": false, "variantNumeric": 3.33,
"variantEmptyObj":"a", "variantEmtpyArray":6,
"obj":{"a": 600, "b": {"x": "f", "y": 1.1, "z": [6, 7, 8, 9]}, "c": 12.3, "v":
"b"},
"arrayString": ["a", "b"], [...]
+{"timestamp": "2023-01-01T00:00:00", "str":null,
"double":null, "bool": true, "variant": 51, "variantNumeric": -0.13,
"variantEmptyObj":1, "variantEmtpyArray":[], "variantWithArrays": [51, -35],
"obj":{"a": 700, "b": {"x": "g", "y": 1.1, "z": [9, null, 9, 9]}, "c": null,
"v": []}, "complexObj":{"x": 400, "y": [{"l": [null], "m": 100, "n": 5},{"l":
["a", "b", "c"], "m": "a", "n": 1}], "z": {}},
[...]
diff --git a/services/src/main/java/org/apache/druid/cli/DumpSegment.java
b/services/src/main/java/org/apache/druid/cli/DumpSegment.java
index a98b9aafdbf..4084aa92810 100644
--- a/services/src/main/java/org/apache/druid/cli/DumpSegment.java
+++ b/services/src/main/java/org/apache/druid/cli/DumpSegment.java
@@ -470,7 +470,7 @@ public class DumpSegment extends GuiceRunnable
jg.writeFieldName("path");
jg.writeString(NestedPathFinder.toNormalizedJsonPath(field));
jg.writeFieldName("types");
- Set<ColumnType> types =
nestedDataColumn.getColumnTypes(field);
+ Set<ColumnType> types =
nestedDataColumn.getFieldTypes(field);
jg.writeStartArray();
for (ColumnType type : types) {
jg.writeString(type.asTypeString());
@@ -617,7 +617,7 @@ public class DumpSegment extends GuiceRunnable
jg.writeFieldName(path);
jg.writeStartObject();
jg.writeFieldName("types");
- Set<ColumnType> types =
nestedDataColumn.getColumnTypes(pathParts);
+ Set<ColumnType> types =
nestedDataColumn.getFieldTypes(pathParts);
jg.writeStartArray();
for (ColumnType type : types) {
jg.writeString(type.asTypeString());
diff --git
a/sql/src/test/java/org/apache/druid/sql/calcite/BaseCalciteQueryTest.java
b/sql/src/test/java/org/apache/druid/sql/calcite/BaseCalciteQueryTest.java
index 53f292d80fc..fdcd27061ab 100644
--- a/sql/src/test/java/org/apache/druid/sql/calcite/BaseCalciteQueryTest.java
+++ b/sql/src/test/java/org/apache/druid/sql/calcite/BaseCalciteQueryTest.java
@@ -1047,7 +1047,8 @@ public class BaseCalciteQueryTest extends CalciteTestBase
i,
types.get(i),
expectedCell,
- resultCell);
+ resultCell
+ );
}
}
}
diff --git
a/sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java
b/sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java
index c892c473269..abedda26690 100644
---
a/sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java
+++
b/sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java
@@ -6002,7 +6002,7 @@ public class CalciteNestedDataQueryTest extends
BaseCalciteQueryTest
"1",
"[]",
"[51,-35]",
-
"{\"a\":700,\"b\":{\"x\":\"g\",\"y\":1.1,\"z\":[9,null,9,9]},\"v\":[]}",
+
"{\"a\":700,\"b\":{\"x\":\"g\",\"y\":1.1,\"z\":[9,null,9,9]},\"c\":null,\"v\":[]}",
"{\"x\":400,\"y\":[{\"l\":[null],\"m\":100,\"n\":5},{\"l\":[\"a\",\"b\",\"c\"],\"m\":\"a\",\"n\":1}],\"z\":{}}",
null,
"[\"a\",\"b\"]",
@@ -6040,7 +6040,7 @@ public class CalciteNestedDataQueryTest extends
BaseCalciteQueryTest
"\"b\"",
"2",
"b",
-
"{\"a\":200,\"b\":{\"x\":\"b\",\"y\":1.1,\"z\":[2,4,6]},\"v\":[]}",
+
"{\"a\":200,\"b\":{\"x\":\"b\",\"y\":1.1,\"z\":[2,4,6]},\"c\":[\"a\",\"b\"],\"v\":[]}",
"{\"x\":10,\"y\":[{\"l\":[\"b\",\"b\",\"c\"],\"m\":\"b\",\"n\":2},[1,2,3]],\"z\":{\"a\":[5.5],\"b\":false}}",
"[\"a\",\"b\",\"c\"]",
"[null,\"b\"]",
@@ -6078,7 +6078,7 @@ public class CalciteNestedDataQueryTest extends
BaseCalciteQueryTest
"1",
"1",
"1",
-
"{\"a\":100,\"b\":{\"x\":\"a\",\"y\":1.1,\"z\":[1,2,3,4]},\"v\":[]}",
+
"{\"a\":100,\"b\":{\"x\":\"a\",\"y\":1.1,\"z\":[1,2,3,4]},\"c\":100,\"v\":[]}",
"{\"x\":1234,\"y\":[{\"l\":[\"a\",\"b\",\"c\"],\"m\":\"a\",\"n\":1},{\"l\":[\"a\",\"b\",\"c\"],\"m\":\"a\",\"n\":1}],\"z\":{\"a\":[1.1,2.2,3.3],\"b\":true}}",
"[\"a\",\"b\"]",
"[\"a\",\"b\"]",
@@ -6116,7 +6116,7 @@ public class CalciteNestedDataQueryTest extends
BaseCalciteQueryTest
"{}",
"4",
"1",
-
"{\"a\":400,\"b\":{\"x\":\"d\",\"y\":1.1,\"z\":[3,4]},\"v\":[]}",
+
"{\"a\":400,\"b\":{\"x\":\"d\",\"y\":1.1,\"z\":[3,4]},\"c\":{\"a\":1},\"v\":[]}",
"{\"x\":1234,\"z\":{\"a\":[1.1,2.2,3.3],\"b\":true}}",
"[\"d\",\"e\"]",
"[\"b\",\"b\"]",
@@ -6154,7 +6154,7 @@ public class CalciteNestedDataQueryTest extends
BaseCalciteQueryTest
"{}",
"[]",
"hello",
- "{\"a\":500,\"b\":{\"x\":\"e\",\"z\":[1,2,3,4]},\"v\":\"a\"}",
+
"{\"a\":500,\"b\":{\"x\":\"e\",\"z\":[1,2,3,4]},\"c\":\"hello\",\"v\":\"a\"}",
"{\"x\":11,\"y\":[],\"z\":{\"a\":[null],\"b\":false}}",
null,
null,
@@ -6192,7 +6192,7 @@ public class CalciteNestedDataQueryTest extends
BaseCalciteQueryTest
"\"a\"",
"6",
null,
-
"{\"a\":600,\"b\":{\"x\":\"f\",\"y\":1.1,\"z\":[6,7,8,9]},\"v\":\"b\"}",
+
"{\"a\":600,\"b\":{\"x\":\"f\",\"y\":1.1,\"z\":[6,7,8,9]},\"c\":12.3,\"v\":\"b\"}",
null,
"[\"a\",\"b\"]",
null,
@@ -7263,4 +7263,127 @@ public class CalciteNestedDataQueryTest extends
BaseCalciteQueryTest
RowSignature.builder().add("EXPR$0", ColumnType.STRING).build()
);
}
+
+ @Test
+ public void testSumPathWithArrays()
+ {
+ /*
+ "obj":{... "c": 100, ...}
+ "obj":{... "c": ["a", "b"], ...}
+ "obj":{...}
+ "obj":{... "c": {"a": 1}, ...},
+ "obj":{... "c": "hello", ...},
+ "obj":{... "c": 12.3, ...},
+ "obj":{... "c": null, ...},
+ */
+ testQuery(
+ "SELECT "
+ + "SUM(JSON_VALUE(obj, '$.c')) "
+ + "FROM druid.all_auto",
+ ImmutableList.of(
+ Druids.newTimeseriesQueryBuilder()
+ .dataSource(DATA_SOURCE_ALL)
+ .intervals(querySegmentSpec(Filtration.eternity()))
+ .granularity(Granularities.ALL)
+ .virtualColumns(new NestedFieldVirtualColumn("obj", "$.c",
"v0", ColumnType.DOUBLE))
+ .aggregators(aggregators(new
DoubleSumAggregatorFactory("a0", "v0")))
+ .context(QUERY_CONTEXT_DEFAULT)
+ .build()
+ ),
+ ImmutableList.of(
+ new Object[]{112.3d}
+ ),
+ RowSignature.builder()
+ .add("EXPR$0", ColumnType.DOUBLE)
+ .build()
+ );
+ }
+
+ @Test
+ public void testCountPathWithArrays()
+ {
+ /*
+ "obj":{... "c": 100, ...}
+ "obj":{... "c": ["a", "b"], ...}
+ "obj":{...}
+ "obj":{... "c": {"a": 1}, ...},
+ "obj":{... "c": "hello", ...},
+ "obj":{... "c": 12.3, ...},
+ "obj":{... "c": null, ...},
+ */
+ // capturing existing behavior... the count should be 4 if it was counting
all non-null primitive values, but that
+ // would mean that the virtual column would need to plan as ARRAY<STRING>
expected type instead of STRING
+ // ... you might notice there are actually 5 non-null obj.c values,
however json_value only returns primitive
+ // values, so the object row is rightfully skipped
+ testQuery(
+ "SELECT "
+ + "COUNT(JSON_VALUE(obj, '$.c')) "
+ + "FROM druid.all_auto",
+ ImmutableList.of(
+ Druids.newTimeseriesQueryBuilder()
+ .dataSource(DATA_SOURCE_ALL)
+ .intervals(querySegmentSpec(Filtration.eternity()))
+ .granularity(Granularities.ALL)
+ .virtualColumns(new NestedFieldVirtualColumn("obj", "$.c",
"v0", ColumnType.STRING))
+ .aggregators(
+ aggregators(
+ new FilteredAggregatorFactory(
+ new CountAggregatorFactory("a0"),
+ not(isNull("v0"))
+ )
+ )
+ )
+ .context(QUERY_CONTEXT_DEFAULT)
+ .build()
+ ),
+ ImmutableList.of(
+ new Object[]{3L}
+ ),
+ RowSignature.builder()
+ .add("EXPR$0", ColumnType.LONG)
+ .build()
+ );
+ }
+
+ @Test
+ public void testCountPathWithArraysReturning()
+ {
+ /*
+ "obj":{... "c": 100, ...}
+ "obj":{... "c": ["a", "b"], ...}
+ "obj":{...}
+ "obj":{... "c": {"a": 1}, ...},
+ "obj":{... "c": "hello", ...},
+ "obj":{... "c": 12.3, ...},
+ "obj":{... "c": null, ...},
+ */
+ testQuery(
+ "SELECT "
+ + "COUNT(JSON_VALUE(obj, '$.c' RETURNING VARCHAR ARRAY)) "
+ + "FROM druid.all_auto",
+ ImmutableList.of(
+ Druids.newTimeseriesQueryBuilder()
+ .dataSource(DATA_SOURCE_ALL)
+ .intervals(querySegmentSpec(Filtration.eternity()))
+ .granularity(Granularities.ALL)
+ .virtualColumns(new NestedFieldVirtualColumn("obj", "$.c",
"v0", ColumnType.STRING_ARRAY))
+ .aggregators(
+ aggregators(
+ new FilteredAggregatorFactory(
+ new CountAggregatorFactory("a0"),
+ not(isNull("v0"))
+ )
+ )
+ )
+ .context(QUERY_CONTEXT_DEFAULT)
+ .build()
+ ),
+ ImmutableList.of(
+ new Object[]{4L}
+ ),
+ RowSignature.builder()
+ .add("EXPR$0", ColumnType.LONG)
+ .build()
+ );
+ }
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]