korowa commented on code in PR #13788:
URL: https://github.com/apache/datafusion/pull/13788#discussion_r1885720096
##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -843,8 +843,16 @@ impl TableProvider for ListingTable {
});
// TODO (https://github.com/apache/datafusion/issues/11600) remove
downcast_ref from here?
let session_state =
state.as_any().downcast_ref::<SessionState>().unwrap();
+
+ // We should not limit the number of partitioned files to scan if
there are filters and limit
+ // at the same time. This is because the limit should be applied after
the filters are applied.
+ let mut statistic_file_limit = limit;
Review Comment:
> if we can done this fix in planner and optimize rule will be more
reasonable
I've just realized that current behavior of the optimizer is reasonable, and
I don't think it should be changed
- if we are not able to push down filters, then we're not pushing down the
limit (since filter will be applied on top of the scan, and limit will be
applied on top of the filter)
- otherwise both filter and limit are pushed down to the scan, and since
then it's the responsibility of the table provider to handle these arguments
(or maybe throw an error if any of them is not supported).
In this context, it's indeed a bug in a listing table since it prunes out
too much data, and your fix looks correct.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]