kumarUjjawal commented on issue #18578:
URL: https://github.com/apache/datafusion/issues/18578#issuecomment-4508190424

   ## Triage
   
   This is a bug in the **example code**, not in DataFusion core. The 
`custom_datasource.rs` example ignores the projection pushdown in its 
`execute()` method.
   
   ### Root cause
   
   In 
[`datafusion-examples/examples/custom_data_source/custom_datasource.rs`](https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/custom_data_source/custom_datasource.rs):
   
   - `CustomExec::new` (L201) computes a `projected_schema` from the projection 
— correct.
   - `CustomExec::execute` (L248–277) always builds a `RecordBatch` with 
**both** `id` and `bank_account` columns, regardless of projection.
   - The `RecordBatch` is constructed with `self.projected_schema.clone()` 
(L268) but always 2 column arrays, so when projection requests a subset, the 
schema and column count diverge.
   
   ### Why each failing query trips it
   
   | Query | Projection from planner | Result |
   |---|---|---|
   | `select * from test` | `None` | full schema, 2 columns — works |
   | `select 1 a from test` | `Some([])` (no source cols needed) | 
projected_schema has 0 fields, exec returns 2 arrays — RecordBatch construction 
fails |
   | `select COUNT(a) from test` | `Some([0])` (only `id`) | projected_schema 
has 1 field, exec returns 2 arrays — fails |
   
   Title is misleading — UDAFs work fine; the example's `CustomExec` doesn't 
honor projection.
   
   ### Fix
   
   `execute()` should either:
   1. Build all columns then project using the projection indices stored on 
`CustomExec`, or
   2. Build only the columns listed in the projection.
   
   `CustomExec` needs to store the `projections: Option<Vec<usize>>` (it 
currently drops it after computing `projected_schema`) so `execute()` can 
select the right column subset.
   
   Happy to send a PR fixing the example.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to