qzyu999 opened a new pull request, #3124:
URL: https://github.com/apache/iceberg-python/pull/3124

   <!--
   Thanks for opening a pull request!
   -->
   
   <!-- In the case this PR will resolve an issue, please replace 
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
   <!-- Closes #${1092} -->
   
   # Rationale for this change
   This introduces a simplified, whole-table compaction strategy via the 
MaintenanceTable API (`table.maintenance.compact()`).
   
   Key implementation details:
   - Reads the entire table state into memory via `.to_arrow()`.
     - Note: This initial implementation uses an in-memory Arrow-based rewrite 
strategy. Future iterations can extend this to support streaming or distributed 
rewrites for larger-than-memory datasets.
   - Uses `table.overwrite()` to rewrite data, leveraging PyIceberg's target 
file bin-packing (`write.target-file-size-bytes`) natively.
   - Ensures atomicity by executing within a table transaction.
   - Explicitly sets `snapshot-type: replace` and `replace-operation: 
compaction` to ensure correct metadata history for downstream engines.
   - Includes a guard to safely ignore compaction requests on empty tables.
   
   ## Are these changes tested?
   Includes full Pytest coverage in `tests/table/test_maintenance.py`.
   
   ## Are there any user-facing changes?
   Yes. This PR adds a new compact() method to the TableMaintenance API, 
allowing users to perform file compaction on existing Iceberg tables.
   
   Example usage:
   ```Python
   table = catalog.load_table("default.my_table")
   # Merges small files into larger ones based on table properties
   table.maintenance.compact()
   ```
   <!-- In the case of user-facing changes, please add the changelog label. -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to