andygrove opened a new pull request, #3413:
URL: https://github.com/apache/datafusion-comet/pull/3413

   ## Which issue does this PR close?
   
   N/A - performance optimization
   
   ## Summary
   
   - Cache the parsed `DataType` from the first FFI batch in `ScanExec` and 
reuse it on subsequent batches via `from_ffi_and_data_type`, skipping redundant 
`FFI_ArrowSchema` → `DataType` parsing
   - Falls back to full `from_ffi` schema parsing when dictionary encoding 
status changes between Parquet row groups (detected by checking the FFI array's 
dictionary pointer)
   - Updates the cached type when a fallback occurs so subsequent batches 
benefit from the cache again
   
   ## How was this patch tested?
   
   - `cargo build` / `cargo fmt` / `cargo clippy` — clean
   - `CometFuzzTestSuite` — 30/30 tests pass
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to