clintropolis commented on issue #17440:
URL: https://github.com/apache/druid/issues/17440#issuecomment-2464061655
fwiw a better way to write this query is
```
SELECT
ARRAY_CONTAINS(JSON_VALUE(agent, '$.type' RETURNING VARCHAR ARRAY),
'Browser')
...
FROM "kttm_nested_1"
```
since `JSON_VALUE` expressions can be optimized quite significantly*
compared to `JSON_QUERY`/`JSON_QUERY_ARRAY` which process the raw json instead
of the specialized nested field columns that are created when ingesting json
columns. The * is because this is slightly dependent on which Druid version
created the json column, since in 28+ nested arrays of primitive types are
stored much more optimally then in earlier versions of Druid, so if the
segments were created with an older version then json_value would also fall
back to processing the raw data to construct the array.
Also a side-note, the `ARRAY_` functions should be preferred over the `MV_`
functions when interacting with actual array types such as stored in json
columns and array columns. the `MV_` functions are primarily for use with
Druids older non-standard multi-value string columns, which present themselves
as `VARCHAR` in the SQL layer and are sort of distinct from actual arrays, so
mixing these up can sometimes result in strange type inference in the SQL layer
when composing more complicated expressions.
Anyway, the reason the query is failing in this form is because of an
optimization for `ARRAY_CONTAINS` that maybe shouldn't happen when used with
`JSON_QUERY_ARRAY`, or rather when the native `array_contains` expression
(which backs both `ARRAY_CONTAINS` and `MV_CONTAINS` in SQL) encounters a
complex type. `JSON_QUERY_ARRAY` doesn't really know the type of the element it
is extracting, so it is handled as `COMPLEX<json>`, however the specialization
check for primitive arrays is trying to cast the rhs argument to match the
element type of the array it will be checking in the case it is a literal.
This can be fixed by modifying
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/math/expr/Function.java#L3977
to instead be something like
```
if (lhsType == null || !(lhsType.isPrimitive() ||
lhsType.isPrimitiveArray())) {
return this;
}
```
where it fixes the issue because it bails out of the optimization path early
and so falls back to the per row checking of type information to decide how to
processing the function.
@vivek807 if you'd like to make a PR with that change and add a test for
this to
https://github.com/apache/druid/blob/master/processing/src/test/java/org/apache/druid/query/expression/NestedDataExpressionsTest.java#L394
I would be happy to approve it, else I can make a PR with a fix later this
week. We also need to do `array_overlap(json_query_array(...))` too since it
has a similar optimization path and will need the same fix
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/math/expr/Function.java#L3724).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]