liujiwen-up opened a new pull request, #341: URL: https://github.com/apache/paimon-rust/pull/341
Purpose Linked issue: close #xxx Fix $physical_files_size over-counting non-data files as data files by replacing basename-only fallback classification with path-aware physical file classification. Brief change log Stop treating unknown physical files as data files. Classify physical files by table-relative path: manifest/manifest-*, manifest/manifest-list-*, manifest/index-manifest-* as manifest files. statistics/* into manifest counters for current output schema compatibility. index/* as index files. partition-aware bucket-*/* and bucket-postpone/* as data files. Pass table partition depth from the DataFusion $physical_files_size system table. Update $physical_files_size documentation to clarify that it is a diagnostic summary, not an orphan cleanup plan. Tests cargo fmt --check cargo test -p paimon table::referenced_files::tests cargo check -p paimon-datafusion Added unit coverage for: Root-level data-* not being counted as data. Unknown files such as _SUCCESS, schema, snapshot, tag, and random files being ignored. manifest-list-* counted as manifest. statistics/* counted through compatible manifest counters. Bucket data files counted even without a data-* prefix. Partition-depth-aware bucket classification. API and Format This changes the Rust API signature of collect_physical_files_summary by adding partition_depth. No storage format changes. Documentation Updated SQL system table docs for $physical_files_size to describe path-aware classification and clarify the diagnostic nature of the summary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
