QuakeWang opened a new issue, #7998:
URL: https://github.com/apache/paimon/issues/7998

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Motivation
   
   Daft's Paimon reader currently decides internally whether each split can use 
Daft native Parquet reader or must fallback to the pypaimon reader.
   
   This decision is important for performance, but it is not observable through 
a public diagnostic API. Users cannot easily tell whether a slow scan is caused 
by PK merge, deletion vectors, BLOB columns, non-Parquet format, or unsupported 
filter pushdown.
   
   Although pypaimon already has `ReadBuilder.explain()`, it explains the 
Paimon scan plan only. It does not expose Daft-specific reader routing, such as 
native Parquet vs pypaimon fallback.
   
   ### Solution
   
   Add a Daft-side structured scan explain API, for example:
   
   - `explain_paimon_scan(...)`
   - or `PaimonTable.explain_scan(...)`
   - or a structured explain method on `PaimonDataSource` exposed through 
public API
   
   The explain result should include at least:
   
   - Paimon scan explain information from `ReadBuilder.explain()`
   - native Parquet split count
   - pypaimon fallback split count
   - fallback reason summary, e.g. PK merge, deletion vectors, BLOB columns, 
non-Parquet format
   - pushed and remaining Daft filters
   - projection and limit pushdown status
   - optional verbose per-split reader mode and fallback reason
   
   The implementation should reuse the same native/fallback decision logic as 
`PaimonDataSource.get_tasks()` to avoid divergence between diagnostics and 
actual execution.
   
   
   ### Anything else?
   
   Relevant code paths:
   
   - `paimon-python/pypaimon/daft/daft_datasource.py`
   - `paimon-python/pypaimon/read/read_builder.py`
   - `paimon-python/pypaimon/daft/daft_predicate_visitor.py`
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to