all) during decoding [datafusion]

via GitHub Tue, 30 Dec 2025 00:32:51 -0800


zhuqi-lucas commented on PR #19545:
URL: https://github.com/apache/datafusion/pull/19545#issuecomment-3698650308


   Great work on implementing list type pushdown! 
   
   I suggest adding a benchmark to demonstrate the performance improvement more 
clearly. Here's a proposed test scenario:
   
   **Benchmark Setup:**
   1. Create a dataset with a `List<String>` column sorted by list values 
(lexicographic order)
   2. Use a row group size that results in multiple row groups (e.g., 10K rows 
per group, 100K total rows)
   3. Apply a selective filter like `array_has(list_col, 'target_value')` that 
matches only ~10% of row groups
   
   **Expected Results:**
   - **Without pushdown**: All row groups must be decoded and filtered → 
baseline time
   - **With pushdown**: ~90% of row groups skipped based on min/max statistics 
→ faster execution
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Parquet: Push down supported list predicates (array_has/any/all) during decoding [datafusion]

Reply via email to