Hi All,
Thank you for articulating your crucial points regarding the proposed integration of `datafusion-json` from `datafusion-contrib` into `datafusion-python`. Your perspective that `datafusion-json` acts as a direct extension of DataFusion itself, rather than a standalone third-party dependency, is well-taken and highlights a significant governance and dependency challenge that we absolutely need to address. The concern that an official Apache project like `datafusion-python` could become dependent on an unofficial extension (which may not consistently keep pace with DataFusion core updates) is indeed serious, particularly regarding potential upgrade blocking. To fully grasp the extent of this risk, could you elaborate further on the specific scenarios where you foresee `datafusion-python` upgrades being directly impeded by the `datafusion-json` dependency? Understanding the precise mechanisms of this potential blockage will be vital as we consider viable pathways forward. It would also be helpful to discuss potential mitigation strategies. Are there approaches we could explore that would allow for the desired functionality while robustly addressing the core dependency and governance issues you've raised, perhaps by more clearly decoupling the lifecycles or by formalizing the `datafusion-json` project under Apache auspices? Your insights on how best to navigate these complexities, ensuring both functionality and the long-term health and integrity of `datafusion-python`, would be greatly appreciated. Thank you. Regards, Vignesh On Tue, 31 Mar 2026 at 12:18, Phillip LeBlanc <[email protected]> wrote: > It’s not exactly like any other 3rd party crate - this library explicitly > depends on (and extends) datafusion. > > This means that new versions of datafusion-python (an official Datafusion > project) now depends on an unofficial extension > project to first upgrade to the newer Datafusion version before the > updated datafusion-python crate can be released. > > From: Kevin Liu <[email protected]> > Date: Tuesday, March 31, 2026 at 2:22 AM > To: [email protected] <[email protected]> > Subject: Re: [DISCUSS] Question on pulling in contrib content to > datafusion-python > > I think we can treat it as pulling any other 3rd party crate/library. > I see that it's marked as an optional dependency [1], which is great. It's > also added as a feature [2]; I would suggest making it explicit that this > is a community contribution, instead of apache. So maybe rename the feature > to `community_json` or something similar. > We can also document in LICENSE/NOTICE/README that the library is a > community contribution not affiliated with the Apache Software Foundation > > Best, > Kevin Liu > > > [1] > > https://github.com/apache/datafusion-python/pull/1466/files#diff-bac59d6e5ada615de3d27a8e8f87d272613a80b5d3e4a7e2c2e4a08e63dcf0a1R56 > [2] > > https://github.com/apache/datafusion-python/pull/1466/files#diff-bac59d6e5ada615de3d27a8e8f87d272613a80b5d3e4a7e2c2e4a08e63dcf0a1R78 > > On Mon, Mar 30, 2026 at 9:45 AM Andrew Lamb <[email protected]> > wrote: > > > Another thing to consider is the maintenance burden (maybe not that bad) > > > > In my mind if we are going to distribute datafusion-python with the json > > functions, we should bring datafusion-json functions under apache > > governance . Otherwise we might end up with a situation like a security > > issue in an Apache product due to some other crate > > > > Of course, we already do this with the other third-party dependencies > (like > > `hashbrown` for example ) so maybe it isn't that different 🤔 > > > > I think the most important thing about bringing in code like that is that > > we ensure that it IP provenance is clear (e.g. that the (original > authors) > > have made the donation explicitly under the apache license. > > > > I am not sure who wrote the code in datafusion-json -- could we get them > to > > make the PR instead of a third party? > > > > Andrew > > > > Andrew > > > > On Mon, Mar 30, 2026 at 10:57 AM Luke Kim <[email protected]> wrote: > > > > > We (Spice AI) use the json crate and it would be nice to have it in, > but > > I > > > think the API should be reviewed for consistency before making it > > official > > > and having people depend on it. > > > > > > It aligns to the PostgreSQL syntax but not exactly/completely. > > > > > > > > > > > > On Mon, Mar 30, 2026 at 7:39 AM, Tim Saucer <[email protected] > <mailto: > > > [email protected]>> wrote: > > > > > > Hi all, > > > > > > A recent PR[1] has been opened to bring in json scalar functions from > the > > > datafusion-contrib crate datafusion-functions-json. Before I move > forward > > > with either approving or closing this PR, I was wondering how the > broader > > > community felt about adding outside content like this. The code from > > > datafusion-contrib is unofficial, so I'm hesitant to include it in our > > > official release. > > > > > > I could see a second route which would be to add python support for all > > of > > > those functions inside that contrib crate. But that means someone who > > > maintains that code will also need to publish python packages in > addition > > > to their current rust code. It's not a huge burden, but it is > additional > > > work. > > > > > > I'd appreciate any thoughts you have on non-official crate functions > > being > > > included. > > > > > > [1]: > > > > > https://github.com/apache/datafusion-python/pull/1466 > > > <https://github.com/apache/datafusion-python/pull/1466> > > > > > > > > >
