I wrote up a ticket to discuss adding the json functions (as well as variant) into the core repo[1]
I would love to hear people's thoughts on this [1]: https://github.com/apache/datafusion/issues/21301 On Tue, Mar 31, 2026 at 7:05 PM Adrian Garcia Badaracco <[email protected]> wrote: > We (Pydantic) are the original authors of `datafusion-functions-json`. We > would be more than happy to donate it and to make any changes necessary to > align it with Postgres if that's desired. I would personally like to see it > in `datafusion-cli` as well. FWIW DuckDB bundles JSON functions out of the > box: https://duckdb.org/docs/current/data/json/json_functions > > On Tue, Mar 31, 2026 at 1:57 AM Vignesh Siva <[email protected]> > wrote: > > > Hi All, > > > > > > Thank you for articulating your crucial points regarding the proposed > > integration of `datafusion-json` from `datafusion-contrib` into > > `datafusion-python`. Your perspective that `datafusion-json` acts as a > > direct extension of DataFusion itself, rather than a standalone > third-party > > dependency, is well-taken and highlights a significant governance and > > dependency challenge that we absolutely need to address. > > > > The concern that an official Apache project like `datafusion-python` > could > > become dependent on an unofficial extension (which may not consistently > > keep pace with DataFusion core updates) is indeed serious, particularly > > regarding potential upgrade blocking. To fully grasp the extent of this > > risk, could you elaborate further on the specific scenarios where you > > foresee `datafusion-python` upgrades being directly impeded by the > > `datafusion-json` dependency? Understanding the precise mechanisms of > this > > potential blockage will be vital as we consider viable pathways forward. > > > > It would also be helpful to discuss potential mitigation strategies. Are > > there approaches we could explore that would allow for the desired > > functionality while robustly addressing the core dependency and > governance > > issues you've raised, perhaps by more clearly decoupling the lifecycles > or > > by formalizing the `datafusion-json` project under Apache auspices? Your > > insights on how best to navigate these complexities, ensuring both > > functionality and the long-term health and integrity of > > `datafusion-python`, would be greatly appreciated. > > > > Thank you. > > > > > > Regards, > > Vignesh > > > > On Tue, 31 Mar 2026 at 12:18, Phillip LeBlanc <[email protected]> > > wrote: > > > > > It’s not exactly like any other 3rd party crate - this library > explicitly > > > depends on (and extends) datafusion. > > > > > > This means that new versions of datafusion-python (an official > Datafusion > > > project) now depends on an unofficial extension > > > project to first upgrade to the newer Datafusion version before the > > > updated datafusion-python crate can be released. > > > > > > From: Kevin Liu <[email protected]> > > > Date: Tuesday, March 31, 2026 at 2:22 AM > > > To: [email protected] <[email protected]> > > > Subject: Re: [DISCUSS] Question on pulling in contrib content to > > > datafusion-python > > > > > > I think we can treat it as pulling any other 3rd party crate/library. > > > I see that it's marked as an optional dependency [1], which is great. > > It's > > > also added as a feature [2]; I would suggest making it explicit that > this > > > is a community contribution, instead of apache. So maybe rename the > > feature > > > to `community_json` or something similar. > > > We can also document in LICENSE/NOTICE/README that the library is a > > > community contribution not affiliated with the Apache Software > Foundation > > > > > > Best, > > > Kevin Liu > > > > > > > > > [1] > > > > > > > > > https://github.com/apache/datafusion-python/pull/1466/files#diff-bac59d6e5ada615de3d27a8e8f87d272613a80b5d3e4a7e2c2e4a08e63dcf0a1R56 > > > [2] > > > > > > > > > https://github.com/apache/datafusion-python/pull/1466/files#diff-bac59d6e5ada615de3d27a8e8f87d272613a80b5d3e4a7e2c2e4a08e63dcf0a1R78 > > > > > > On Mon, Mar 30, 2026 at 9:45 AM Andrew Lamb <[email protected]> > > > wrote: > > > > > > > Another thing to consider is the maintenance burden (maybe not that > > bad) > > > > > > > > In my mind if we are going to distribute datafusion-python with the > > json > > > > functions, we should bring datafusion-json functions under apache > > > > governance . Otherwise we might end up with a situation like a > security > > > > issue in an Apache product due to some other crate > > > > > > > > Of course, we already do this with the other third-party dependencies > > > (like > > > > `hashbrown` for example ) so maybe it isn't that different 🤔 > > > > > > > > I think the most important thing about bringing in code like that is > > that > > > > we ensure that it IP provenance is clear (e.g. that the (original > > > authors) > > > > have made the donation explicitly under the apache license. > > > > > > > > I am not sure who wrote the code in datafusion-json -- could we get > > them > > > to > > > > make the PR instead of a third party? > > > > > > > > Andrew > > > > > > > > Andrew > > > > > > > > On Mon, Mar 30, 2026 at 10:57 AM Luke Kim <[email protected]> wrote: > > > > > > > > > We (Spice AI) use the json crate and it would be nice to have it > in, > > > but > > > > I > > > > > think the API should be reviewed for consistency before making it > > > > official > > > > > and having people depend on it. > > > > > > > > > > It aligns to the PostgreSQL syntax but not exactly/completely. > > > > > > > > > > > > > > > > > > > > On Mon, Mar 30, 2026 at 7:39 AM, Tim Saucer <[email protected] > > > <mailto: > > > > > [email protected]>> wrote: > > > > > > > > > > Hi all, > > > > > > > > > > A recent PR[1] has been opened to bring in json scalar functions > from > > > the > > > > > datafusion-contrib crate datafusion-functions-json. Before I move > > > forward > > > > > with either approving or closing this PR, I was wondering how the > > > broader > > > > > community felt about adding outside content like this. The code > from > > > > > datafusion-contrib is unofficial, so I'm hesitant to include it in > > our > > > > > official release. > > > > > > > > > > I could see a second route which would be to add python support for > > all > > > > of > > > > > those functions inside that contrib crate. But that means someone > who > > > > > maintains that code will also need to publish python packages in > > > addition > > > > > to their current rust code. It's not a huge burden, but it is > > > additional > > > > > work. > > > > > > > > > > I'd appreciate any thoughts you have on non-official crate > functions > > > > being > > > > > included. > > > > > > > > > > [1]: > > > > > > > > > https://github.com/apache/datafusion-python/pull/1466 > > > > > <https://github.com/apache/datafusion-python/pull/1466> > > > > > > > > > > > > > > > > > > > >
