alamb commented on issue #19654: URL: https://github.com/apache/datafusion/issues/19654#issuecomment-3761992045
> This means for each and every partition, we apply limit separately, I think this is an incorrect implementation? 😓 Do correct me if I am misunderstanding anything here. 🙈 Why do you think this an incorrect implementation? In addition to the datasource limit, there is also then a LimitExec somewhere above in the plan too - the datasources may return more than the requested rows. I think at least one reason that each partition has a limit is that the datasource may not have enough rows (e.g. think `LIMIT 1000` and the file only has 500 rows -- you need to scan across all the files) > Question 1 --> Given all claims verify, how do we want to proceed? I can get above PR till ListingTable merged, and then we can fix limit pushdown. After which I can work out offset pushdown too. 💃🏻 My suggestion is 1. Change the pushdown filter rule to preserve both limit and an `offset` and propagate that to the DataSource (right now it only has `limit`) but don't use it yet 2. Update the Parquet opener to use the offset (e.g. I think you could implement some update to the `ReadPlan` that is created for each file, and potentially push the offset calculation directly into the parquet reader Pushing down offsets instead of just limits may be tricky to avoid a bg breaking API chainge -- (though maybe you could add some backwards compatible api `ExecutionPlan::with_limit_offset` for example) Unfortunately I am not likely to be able to help drive this work as I am pretty overwhelmed with review stuff now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
