Hey Alex,

Thanks for raising this, and I'm happy to provide some historical context
around this.

Back in the days when developing PyIceberg, I've tried to purely rely on
the output of the generator that we use for producing
rest-catalog-open-api.py, however, I quickly noticed that the structure
that we have is too complex to express in the open-api definition (at least
back then). One prominent example is how schema's are encoded (related
issues #6798 <https://github.com/apache/iceberg/issues/6798>, #6672
<https://github.com/apache/iceberg/pull/6672>); a type can both be a string
(eg. fixed[22]), or an object (eg {"type": "map", ...}). In PyIceberg,
unfortunately, we had to add some deserialization logic
<https://github.com/apache/iceberg-python/blob/52d810efb62e39ec6d8d6a2f4cd2cad8165e2d2c/pyiceberg/types.py#L126-L128>
to
make this situation work. There are some more examples, like the arbitrary
fields in the snapshot summary that need some additional TLC when
validating.

When adding new request/response models, I think it makes sense to copy
then from the generated code, but probably they need some more work to make
them usable. (Tip for anyone who's interested in picking up
iceberg-python#2302
<https://github.com/apache/iceberg-python/issues/2302>). But
I don't think it makes a lot of sense publishing them as-is.

Kind regards,
Fokko




Op za 6 sep 2025 om 01:38 schreef Alex Stephen
<alexstep...@google.com.invalid>:

> Hi all,
>
> I noticed that we generate a set of Python models
> <https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.py>
> containing Request/Response objects for the REST Catalog.
>
> PyIceberg has recreated many of these models
> <https://github.com/apache/iceberg-python/blob/main/pyiceberg/catalog/rest/__init__.py>
>  albeit
> with less detail in many cases.
>
> I'd like to propose that we publish the rest-catalog-open-api.py
> <https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.py>
>  file
> as a Python package that PyIceberg can import. This would allow us to keep
> the REST Catalog models + the Python models in sync. It would also allow
> other Python packages to pick up the REST Catalog schema while staying in
> sync with any changes that we make to the REST Catalog.
>
> We could publish this new Python package out of the standard Iceberg repo.
> Preferably, we'd release it separately from the Java library so that Java +
> Python can develop concurrently.
>
> Any thoughts?
>
> Thanks!
>
> -- Alex Stephen
>

Reply via email to