alamb commented on issue #7845:
URL: 
https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-2066290235

   Nice -- thank you for the offer and information @samuelcolvin  and @adriangb 
   
   ## High level proposal
   I think it would initailly possible to implement JSON support using the 
existing datafusion apis (and implement it in a separate crate). I think 
@philippemnoel is also interested in such an endeavor as well.
   
   Using the existing extension APIs would both
   1. Allow initial iteration to go faster (without the ASF governance)
   2. Ensure that the datafusion APIs are sufficient for the usecase
   
   While I don't think I personally have the bandwidth to help implement the 
JSON functionality I think the API design is critical to the success of 
DataFusion and would be very interested in helping make it happen.
   
   ## Specific proposal
   So, in this case the solution might look like
   1. Create a new repo in crate like `datafusion-functions-json`
   2. Follow the the model of the `functions-array` that are built into 
DataFusion (kudos to @jayzhan211  for making most of that happen): 
https://github.com/apache/arrow-datafusion/blob/19356b26f515149f96f9b6296975a77ac7260149/datafusion/functions-array/src/lib.rs#L102-L149
   3. Add the appropriate functions (like `json_contains`) as `ScalarUDF`s
   4. Add a rewrite pass to rewrite [json 
operators](https://www.postgresql.org/docs/9.5/functions-json.html) like `-->` 
into the appropriate function calls (see ArrayFunctionRewriter for an example: 
https://github.com/apache/arrow-datafusion/blob/19356b26f515149f96f9b6296975a77ac7260149/datafusion/functions-array/src/rewrite.rs#L41-L64
   
   Then using the JSON functionality would look something like
   ```rust
   let ctx = SessionContext::new();
   datafusion_json_functions::register_all(&ctx)?;
   
   let results = ctx.sql("SELECT count(*) FROM records where 
json_contains(attributes, 'size')
     .await?
   .collect()?;
   ```
   
   ## Next Steps
   Once we have that working I think we could then discuss with the community 
if we should incorporate this set of functions into the main datafusion repo 
for maintenance (given the demand I suspect this would be a popular idea)
   
   ## New Repo
   If anyone is interested. I created 
https://github.com/datafusion-contrib/datafuison-functions-json in the 
datafusion-contrib organization. I can add you as admin if you are interested. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to