korowa commented on code in PR #13788:
URL: https://github.com/apache/datafusion/pull/13788#discussion_r1885720096


##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -843,8 +843,16 @@ impl TableProvider for ListingTable {
             });
         // TODO (https://github.com/apache/datafusion/issues/11600) remove 
downcast_ref from here?
         let session_state = 
state.as_any().downcast_ref::<SessionState>().unwrap();
+
+        // We should not limit the number of partitioned files to scan if 
there are filters and limit
+        // at the same time. This is because the limit should be applied after 
the filters are applied.
+        let mut statistic_file_limit = limit;

Review Comment:
   > if we can done this fix in planner and optimize rule will be more 
reasonable
   
   I've just realized that current behavior of the optimizer is reasonable, and 
I don't think it should be changed
   - if we are not able to push down filters, then we're not pushing down the 
limit (since filter will be applied on top of the scan, and limit will be 
applied on top of the filter)
   - otherwise both filter and limit are pushed down to the scan, and since 
then it's the responsibility of the table provider to handle these arguments 
(or maybe throw an error if any of them is not supported).
   
   In this context, it's indeed a bug in a listing table since it prunes out 
too much data, and your fix looks correct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to