clintropolis opened a new pull request, #15521:
URL: https://github.com/apache/druid/pull/15521

   ### Description
   This PR adds `JSON_QUERY_ARRAY` which is sort of like `JSON_QUERY` but 
instead of returning `COMPLEX<json>` for any value extracted from some json 
path, instead returns `ARRAY<COMPLEX<json>>`. This is currently done purely 
with `ExpressionVirtualColumn` via a `DirectOperatorConversion` rather than 
using the specialized `NestedFieldVirtualColumn` used by `JSON_VALUE` and 
`JSON_QUERY`, mostly because there isn't a lot of room for optimization yet, 
and I would rather wait until the future if we introduce specialized array 
column selectors than trying to extend the existing selectors of this virtual 
column to also handle arrays of objects.
   
   Similar to other array handling, values which are not arrays will be coerced 
into single element arrays, though I am open to discussion on this, since it 
would seem equally valid to handle them as null values...
   
   This allows for a lot of useful stuff like using `UNNEST` on arrays of 
objects, to transform an array of json objects into rows of json objects.
   
   For example, using some data sourced from a discussion in a community slack 
thread, which has top level arrays of objects (would also work with nested 
arrays of objects at some path)
   
   <img width="1184" alt="Screenshot 2023-12-08 at 12 01 39 AM" 
src="https://github.com/apache/druid/assets/1577461/7b1ecd93-1196-46b5-a0e5-36334856a443";>
   
   We can use `JSON_QUERY_ARRAY` to do stuff like translate it to a separate 
row per object:
   
   <img width="899" alt="Screenshot 2023-12-08 at 12 02 33 AM" 
src="https://github.com/apache/druid/assets/1577461/e6b42909-6757-43da-bfe7-2d3f6d8df39d";>
   
   
   and further use `JSON_VALUE` to extract values from these objects and do 
stuff like group or aggregate on them:
   
   <img width="772" alt="Screenshot 2023-12-08 at 12 04 08 AM" 
src="https://github.com/apache/druid/assets/1577461/6912aa46-1d10-4ac9-b578-aee929de2d79";>
   
   Will add docs in a follow-up PR.
   
   #### Release note
   Added `JSON_QUERY_ARRAY` which is similar to `JSON_QUERY` except the return 
type is always `ARRAY<COMPLEX<json>>` instead of `COMPLEX<json>`. Essentially, 
this function allows extracting arrays of objects from nested data and 
performing operations such as `UNNEST`, `ARRAY_LENGTH`, `ARRAY_SLICE`, or any 
other available ARRAY operations. 
   
   <hr>
   
   This PR has:
   
   - [ ] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [x] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to