Hey Andre,

Thanks for raising this. From the beginning this was more of a
convenience helper, but I agree that ideally PyIceberg should be used in
the engine as library to handle the Iceberg metadata.

That said, I don't think there is a real burden on the PyIceberg community,
except sometime we hit a library that does some monkey patching
<https://github.com/apache/iceberg-python/pull/2401>. It also helps
sometime to identify which methods are used by libraries that use
PyIceberg. Some libraries like to use fields that are marked as private
<https://github.com/Eventual-Inc/Daft/pull/3917> by the Python convention :)

For example, Bodo currently pins PyArrow to version 19.0, which could
> potentially block us from adopting newer PyArrow features (e.g: UUID
> support in 21.0)


I think this is the biggest issue, but we also use Arrow quite extensively
in PyIceberg itself. So, probably we should also test for the lower-bound
that we support
<https://github.com/apache/iceberg-python/blob/52d810efb62e39ec6d8d6a2f4cd2cad8165e2d2c/pyproject.toml#L66>.
Personally, I would love to upgrade PyArrow more aggressively, but we also
have to take into account our users that are still locked to an older
version.

Kind regards,
Fokko


Op vr 5 sep 2025 om 03:38 schreef André Luis Anastácio
<ndrl...@proton.me.invalid>:

> Hi everyone,
>
> I'm starting this discussion thread about the optional third-party
> dependencies we currently maintain in PyIceberg to support to_*()
> conversion methods (e.g., to_daft(), to_polars(), to_pandas(), to_ray(),
> to_duckdb(), etc.).
>
> While this integration provides a great user experience by offering
> seamless conversions, it creates some maintenance challenges for the
> PyIceberg project:
>
> Maintenance burden: We're responsible for ensuring compatibility with all
> these external libraries and any future additions
>
> *Version conflicts*: Some tools have specific PyArrow version
> requirements that can conflict with PyIceberg's needs. For example, Bodo
> currently pins PyArrow to version 19.0, which could potentially block us
> from adopting newer PyArrow features (e.g: UUID support in 21.0)
>
> *Dependency management complexity*: Managing compatibility across
> multiple external libraries adds complexity to our release cycle
>
> IMHO rather than PyIceberg maintaining integrations with external
> libraries, perhaps these libraries should implement their own PyIceberg
> support
>
> I'd love to hear the community's thoughts on this approach. Has anyone
> else encountered similar challenges, or are there benefits to the current
> model that I might be overlooking?
>
> André Anastácio
>

Reply via email to