alamb commented on issue #7845: URL: https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-2066290235
Nice -- thank you for the offer and information @samuelcolvin and @adriangb ## High level proposal I think it would initailly possible to implement JSON support using the existing datafusion apis (and implement it in a separate crate). I think @philippemnoel is also interested in such an endeavor as well. Using the existing extension APIs would both 1. Allow initial iteration to go faster (without the ASF governance) 2. Ensure that the datafusion APIs are sufficient for the usecase While I don't think I personally have the bandwidth to help implement the JSON functionality I think the API design is critical to the success of DataFusion and would be very interested in helping make it happen. ## Specific proposal So, in this case the solution might look like 1. Create a new repo in crate like `datafusion-functions-json` 2. Follow the the model of the `functions-array` that are built into DataFusion (kudos to @jayzhan211 for making most of that happen): https://github.com/apache/arrow-datafusion/blob/19356b26f515149f96f9b6296975a77ac7260149/datafusion/functions-array/src/lib.rs#L102-L149 3. Add the appropriate functions (like `json_contains`) as `ScalarUDF`s 4. Add a rewrite pass to rewrite [json operators](https://www.postgresql.org/docs/9.5/functions-json.html) like `-->` into the appropriate function calls (see ArrayFunctionRewriter for an example: https://github.com/apache/arrow-datafusion/blob/19356b26f515149f96f9b6296975a77ac7260149/datafusion/functions-array/src/rewrite.rs#L41-L64 Then using the JSON functionality would look something like ```rust let ctx = SessionContext::new(); datafusion_json_functions::register_all(&ctx)?; let results = ctx.sql("SELECT count(*) FROM records where json_contains(attributes, 'size') .await? .collect()?; ``` ## Next Steps Once we have that working I think we could then discuss with the community if we should incorporate this set of functions into the main datafusion repo for maintenance (given the demand I suspect this would be a popular idea) ## New Repo If anyone is interested. I created https://github.com/datafusion-contrib/datafuison-functions-json in the datafusion-contrib organization. I can add you as admin if you are interested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
