QuakeWang opened a new issue, #7998: URL: https://github.com/apache/paimon/issues/7998
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Motivation Daft's Paimon reader currently decides internally whether each split can use Daft native Parquet reader or must fallback to the pypaimon reader. This decision is important for performance, but it is not observable through a public diagnostic API. Users cannot easily tell whether a slow scan is caused by PK merge, deletion vectors, BLOB columns, non-Parquet format, or unsupported filter pushdown. Although pypaimon already has `ReadBuilder.explain()`, it explains the Paimon scan plan only. It does not expose Daft-specific reader routing, such as native Parquet vs pypaimon fallback. ### Solution Add a Daft-side structured scan explain API, for example: - `explain_paimon_scan(...)` - or `PaimonTable.explain_scan(...)` - or a structured explain method on `PaimonDataSource` exposed through public API The explain result should include at least: - Paimon scan explain information from `ReadBuilder.explain()` - native Parquet split count - pypaimon fallback split count - fallback reason summary, e.g. PK merge, deletion vectors, BLOB columns, non-Parquet format - pushed and remaining Daft filters - projection and limit pushdown status - optional verbose per-split reader mode and fallback reason The implementation should reuse the same native/fallback decision logic as `PaimonDataSource.get_tasks()` to avoid divergence between diagnostics and actual execution. ### Anything else? Relevant code paths: - `paimon-python/pypaimon/daft/daft_datasource.py` - `paimon-python/pypaimon/read/read_builder.py` - `paimon-python/pypaimon/daft/daft_predicate_visitor.py` ### Are you willing to submit a PR? - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
