liujiwen-up opened a new pull request, #341:
URL: https://github.com/apache/paimon-rust/pull/341

   Purpose
   Linked issue: close #xxx
   
   Fix $physical_files_size over-counting non-data files as data files by 
replacing basename-only fallback classification with path-aware physical file 
classification.
   
   Brief change log
   Stop treating unknown physical files as data files.
   Classify physical files by table-relative path:
   manifest/manifest-*, manifest/manifest-list-*, manifest/index-manifest-* as 
manifest files.
   statistics/* into manifest counters for current output schema compatibility.
   index/* as index files.
   partition-aware bucket-*/* and bucket-postpone/* as data files.
   Pass table partition depth from the DataFusion $physical_files_size system 
table.
   Update $physical_files_size documentation to clarify that it is a diagnostic 
summary, not an orphan cleanup plan.
   Tests
   cargo fmt --check
   cargo test -p paimon table::referenced_files::tests
   cargo check -p paimon-datafusion
   Added unit coverage for:
   
   Root-level data-* not being counted as data.
   Unknown files such as _SUCCESS, schema, snapshot, tag, and random files 
being ignored.
   manifest-list-* counted as manifest.
   statistics/* counted through compatible manifest counters.
   Bucket data files counted even without a data-* prefix.
   Partition-depth-aware bucket classification.
   API and Format
   This changes the Rust API signature of collect_physical_files_summary by 
adding partition_depth.
   
   No storage format changes.
   
   Documentation
   Updated SQL system table docs for $physical_files_size to describe 
path-aware classification and clarify the diagnostic nature of the summary.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to