Hey Andre, Thanks for raising this. From the beginning this was more of a convenience helper, but I agree that ideally PyIceberg should be used in the engine as library to handle the Iceberg metadata.
That said, I don't think there is a real burden on the PyIceberg community, except sometime we hit a library that does some monkey patching <https://github.com/apache/iceberg-python/pull/2401>. It also helps sometime to identify which methods are used by libraries that use PyIceberg. Some libraries like to use fields that are marked as private <https://github.com/Eventual-Inc/Daft/pull/3917> by the Python convention :) For example, Bodo currently pins PyArrow to version 19.0, which could > potentially block us from adopting newer PyArrow features (e.g: UUID > support in 21.0) I think this is the biggest issue, but we also use Arrow quite extensively in PyIceberg itself. So, probably we should also test for the lower-bound that we support <https://github.com/apache/iceberg-python/blob/52d810efb62e39ec6d8d6a2f4cd2cad8165e2d2c/pyproject.toml#L66>. Personally, I would love to upgrade PyArrow more aggressively, but we also have to take into account our users that are still locked to an older version. Kind regards, Fokko Op vr 5 sep 2025 om 03:38 schreef André Luis Anastácio <ndrl...@proton.me.invalid>: > Hi everyone, > > I'm starting this discussion thread about the optional third-party > dependencies we currently maintain in PyIceberg to support to_*() > conversion methods (e.g., to_daft(), to_polars(), to_pandas(), to_ray(), > to_duckdb(), etc.). > > While this integration provides a great user experience by offering > seamless conversions, it creates some maintenance challenges for the > PyIceberg project: > > Maintenance burden: We're responsible for ensuring compatibility with all > these external libraries and any future additions > > *Version conflicts*: Some tools have specific PyArrow version > requirements that can conflict with PyIceberg's needs. For example, Bodo > currently pins PyArrow to version 19.0, which could potentially block us > from adopting newer PyArrow features (e.g: UUID support in 21.0) > > *Dependency management complexity*: Managing compatibility across > multiple external libraries adds complexity to our release cycle > > IMHO rather than PyIceberg maintaining integrations with external > libraries, perhaps these libraries should implement their own PyIceberg > support > > I'd love to hear the community's thoughts on this approach. Has anyone > else encountered similar challenges, or are there benefits to the current > model that I might be overlooking? > > André Anastácio >