clintropolis commented on PR #12753: URL: https://github.com/apache/druid/pull/12753#issuecomment-1183687543
>To me this sounds like the basics of the feature are production-ready. There may be various callouts about performance, but it seems that the compatibility story is tight enough that the feature doesn't need the experimental markings at this time. I think that is fair :+1: >You mentioned being somewhat less certain about the behavior of nested arrays. We should figure out if that part is going to be included in the production-ready feature set, or if we'll call that particular scenario out as an evolving area. What is your intent & recommendation in this area? It is definitely going to be an evolving area (which I think could be said of our array support in general), though there are probably a narrow range use cases that could be used today, mainly where array lengths and element positions are known and have some meaning and query time operations are primarily extracting and operating on individual elements. This is more or less the current limitations of `flattenSpec` with nested arrays I think. There are some lower hanging fruit that would improve stuff in the near term, some of which might be possible to get in before the next release. The first supporting wildcards in the subset of the path syntax that we support, which would allow `JSON_QUERY` and `JSON_VALUE` (or something like it.. i'm not sure entirely how the `RETURNING` syntax would work with array types in SQL so need to do some tinkering there) to extract complete arrays. For `JSON_QUERY` these results would still be `COMPLEX<json>` typed , but `JSON_VALUE`* would spit out druid literal array types (`ARRAY<LONG>`, `ARRAY<STRING>`, etc). For nested arrays of JSON objects extracted by `JSON_QUERY`, i think we will want a way to convert a `COMLEX<json>` into an `ARRAY<COMPLEX<json>>` so that they too can take part in array operations, _especially_ once we add a native `UNNEST` function to transform arrays into tables, which would be the path to exploding out these nested objects and performing operations on their contents. At some point after that, I intend to introduce the option to begin storing literal arrays in nested `ARRAY` typed columns instead of them broken out into separate columns for individual elements like they currently exist (so that array operations don't have to decompress a bunch of separate columns to do stuff). I guess I'm getting a bit into the weeds, but my point I guess is that I think this feature will evolve along-side and should help us improve array support in general, so am hyped to get it there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
