zhuqi-lucas commented on PR #19545: URL: https://github.com/apache/datafusion/pull/19545#issuecomment-3698650308
Great work on implementing list type pushdown! I suggest adding a benchmark to demonstrate the performance improvement more clearly. Here's a proposed test scenario: **Benchmark Setup:** 1. Create a dataset with a `List<String>` column sorted by list values (lexicographic order) 2. Use a row group size that results in multiple row groups (e.g., 10K rows per group, 100K total rows) 3. Apply a selective filter like `array_has(list_col, 'target_value')` that matches only ~10% of row groups **Expected Results:** - **Without pushdown**: All row groups must be decoded and filtered → baseline time - **With pushdown**: ~90% of row groups skipped based on min/max statistics → faster execution -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
