[DISCUSS] PyIceberg optional third-party dependencies lock-in

André Luis Anastácio Thu, 04 Sep 2025 18:38:31 -0700

Hi everyone,

I'm starting this discussion thread about the optional third-party dependencies 
we currently maintain in PyIceberg to support to_*() conversion methods (e.g., 
to_daft(), to_polars(), to_pandas(), to_ray(), to_duckdb(), etc.).


While this integration provides a great user experience by offering seamless 
conversions, it creates some maintenance challenges for the PyIceberg project:

Maintenance burden: We're responsible for ensuring compatibility with all these 
external libraries and any future additions

Version conflicts: Some tools have specific PyArrow version requirements that 
can conflict with PyIceberg's needs. For example, Bodo currently pins PyArrow 
to version 19.0, which could potentially block us from adopting newer PyArrow 
features (e.g: UUID support in 21.0)

Dependency management complexity: Managing compatibility across multiple 
external libraries adds complexity to our release cycle

IMHO rather than PyIceberg maintaining integrations with external libraries, 
perhaps these libraries should implement their own PyIceberg support

I'd love to hear the community's thoughts on this approach. Has anyone else 
encountered similar challenges, or are there benefits to the current model that 
I might be overlooking?

André Anastácio

[DISCUSS] PyIceberg optional third-party dependencies lock-in

Reply via email to