[GitHub] [druid] clintropolis commented on issue #12695: Druid nested data columns

GitBox Wed, 06 Jul 2022 23:13:21 -0700


clintropolis commented on issue #12695:
URL: https://github.com/apache/druid/issues/12695#issuecomment-1177133110


   Hi @FrankChen021 thanks for having a look, sorry I haven't finished filling 
out the PR description for #12753 yet, it might have been able to answer some 
of your questions.
   
   >When using json_value function to extract a value, this function does not 
know what the actual type of the value in advance and maybe there might be some 
problems.
   
   For this, I have actually picked up a syntax that was implemented in the 
Calcite parser for these functions that allows specifying the type inline, 
looking something like
   
   ```JSON_VALUE(x, '$.some.nested' RETURNING BIGINT)```
   
   The way I have implemented stuff during SQL conversion will also "eat" cast 
operations, so wrapping `JSON_VALUE` in a `CAST` will plan to the same native 
virtual column as the 'returning' syntax would, decorated with the expected 
type. I don't know how standard this part of the syntax is, but it came free 
from using these functions in Calcite.
   
   Underlying `JSON_VALUE` is a thing called the 
[`NestedFieldVirtualColumn`](https://github.com/apache/druid/pull/12753/files#diff-2aee7245b26958e74829b61d97461262e1a92b2ff7b1a633028eef96b8c1ef62R62)
 which can make optimized type selectors as asked for by the expected type, as 
well as supply the column indexes and everything else which provides 'native' 
druid performance.
   
   
   >I don't know if json_value and json_query are the final SQL function names 
in your implementation, but I think they're a little confusion for people when 
they first touch these functions. json_extract is a more widely used function 
name in other database like MySQL, SQLite, ClickHouse etc, I think we can use 
it so people are easy to know what these functions do.
   
   I did some surveying as well and `JSON_VALUE` and `JSON_QUERY` are also 
relatively widely used, in fact it looks like even Clickhouse supports them 
(also big query, ms sql, oracle, etc). That said, it is relatively easy to add 
additional functions at the SQL layer, including optimized functions that plan 
into the `NestedFieldVirtualColumn`, you can see 2 of them still hanging out in 
my PR, `GET_PATH` and `JSON_GET_PATH` were my initial prototype functions and 
use 'jq' style paths instead of 'JSONPath', [see 
`NestedDataOperatorConversions`](https://github.com/apache/druid/pull/12753/files#diff-266bdd0d604d7441493d9b07a91dd663acd39472162b3c2e6384ca63b36c3a9dR145)
   
   >The final one is about the performance of application of JSONPath in above 
functions. There is a PR(https://github.com/apache/druid/pull/11467) compares 
the JSONPath-based implementation and a hand-writing implementation which turns 
out the latter one has much better performance. I'm not saying this 
hand-writing implementation should be used but just want to give you a hint.
   
   Heh, not to worry, I actually also wrote my own JSONPath parser (also jq 
parser) that only handles the subset of functionality that we actually support 
for these columns [see 
`NestedPathFinder`](https://github.com/apache/druid/pull/12753/files#diff-58d96677255bd0fd215bdf5d7d519181c48df622a30d957c0124acfd399bd672R67).
 I actually missed #11467, though I guess there is a bit of difference between 
what's going on in that PR, which looks like it has functions to handle 
stringified json inputs, vs what I'm proposing here, which is a new column type 
that decomposes the JSON at ingest time into indexed nested columns, but I'll 
have a look to see if there is any overlap. Thanks for pointing this out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] clintropolis commented on issue #12695: Druid nested data columns

Reply via email to