Hi all, I’m sharing an RFC to prepare PyIceberg’s public API for 1.0.0. We’ve wanted to refine the list of public APIs for some time, but it’s been difficult because many changes touch user-facing contracts. I believe the right way to get started is to first agree on the approach for identifying what we want to expose, then split the work into smaller, incremental changes that we can complete over time.
At a high level, I'm proposing the following approach: - Use "__all__" as the single source of truth of curated symbols per module. - Classify modules as Intended Public (Full), Intended Public (Subset), or Internal. - Roll out with deprecations first and remove deprecation warnings in the 1.0.0 release (no user-visible breaks during the transition: all symbols remain importable). - Add CI guardrails to detect breaking changes. - Optionally re-export a minimal, discoverable subset at pyiceberg top level module. I’m asking for input for the following in this thread: - Agreement (or objections) on the "__all__" based explicit public API declaration approach - Agreement on per-module curation model to kickoff and split out the work into smaller increments. - Agreement on using Intended Public (Full) / Intended Public (Subset) / Internal classifications at the module levels to get a high level consensus on API structure to organize the work. If the approach looks good, I’ll draft an initial module-level API classification (a mapping of each top-level module to one of the public classifications proposed above) and share it in a follow-up DISCUSS thread to build lazy consensus at the module level. Per-symbol decisions within each module and their submodules can then be made through sub-issues/PRs. RFC Link: https://docs.google.com/document/d/1-0-2Wx8saf3EQQW6AyMtPxlBLs7P5SJGsgimLQ4E_1Y/edit?usp=sharing Best, Sung Yun