alamb commented on issue #7845:
URL: 
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1922212879

   I think there are two major themes here:
   
   ## Theme 1
   How to query such semi-structured data. 
   
   DataFusion today supports the Arrow type system, which while powerful can 
not describe arbitrary json schemas. For example you have to know up front what 
all the possible fields are for a StructArray
   
   Systems like Postgres handle this by adding a new type and operators (see 
[Postgres Docs]( https://www.postgresql.org/docs/current/datatype-json.html) 
for example the `containment` operators
   
   ```sql
   -- This array contains the primitive string value:
   SELECT '["foo", "bar"]'::jsonb @> '"bar"'::jsonb;
   
   -- String exists as array element:
   SELECT '["foo", "bar", "baz"]'::jsonb ? 'bar';
   
   -- String exists as object key:
   SELECT '{"foo": "bar"}'::jsonb ? 'foo';
   
   -- Object values are not considered:
   SELECT '{"foo": "bar"}'::jsonb ? 'bar';  -- yields false
   ```
   
   ## Theme 2: how to store such data efficiently
   
   Once we have some sort of idea how to query the data then there is the 
question about how to store / implement those queries efficiently. I think that 
is where 
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-1779821351 
from @thinkharderdev  and https://github.com/ArroyoSystems/arroyo/pull/477 from 
@mwylde are coming from 
   
   ## Suggestions on paths forward
   
   I personally think it may be tough to reach a consensus on storing JSON vs 
BSON vs JSONA as they seem to have different usecases and pros/cons (interop vs 
performance for example)
   
   However, I could easily see this supported as "extension types" (basically 
views over StringArray/BinaryArray). 
   
   I think we could implement such support today using functions, albiet with a 
poor UX. Like why can't we do something like
   
   ```
   -- create jsonb constant from a json string, like `make_json`
   select make_jsonb('{"foo": "bar"}');
   
   -- create json string from jsonb
   select json_from_jsonb(<binary json>);
   
   --  test if data in binary_column contains the structure {foo: bar}
   select jsonb_contains(binary_column, make_jsonb('{"foo": "bar"}');
   
   ```
   
   🤔 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to