qzyu999 opened a new pull request, #3124:
URL: https://github.com/apache/iceberg-python/pull/3124
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${1092} -->
# Rationale for this change
This introduces a simplified, whole-table compaction strategy via the
MaintenanceTable API (`table.maintenance.compact()`).
Key implementation details:
- Reads the entire table state into memory via `.to_arrow()`.
- Note: This initial implementation uses an in-memory Arrow-based rewrite
strategy. Future iterations can extend this to support streaming or distributed
rewrites for larger-than-memory datasets.
- Uses `table.overwrite()` to rewrite data, leveraging PyIceberg's target
file bin-packing (`write.target-file-size-bytes`) natively.
- Ensures atomicity by executing within a table transaction.
- Explicitly sets `snapshot-type: replace` and `replace-operation:
compaction` to ensure correct metadata history for downstream engines.
- Includes a guard to safely ignore compaction requests on empty tables.
## Are these changes tested?
Includes full Pytest coverage in `tests/table/test_maintenance.py`.
## Are there any user-facing changes?
Yes. This PR adds a new compact() method to the TableMaintenance API,
allowing users to perform file compaction on existing Iceberg tables.
Example usage:
```Python
table = catalog.load_table("default.my_table")
# Merges small files into larger ones based on table properties
table.maintenance.compact()
```
<!-- In the case of user-facing changes, please add the changelog label. -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]