backkem commented on issue #7871: URL: https://github.com/apache/arrow-datafusion/issues/7871#issuecomment-1773320638
I think there is merit to both cases. My use-case falls more under the former. My table provider wraps a remote DB and I want to fetch only a part of the table, for a simple pagination case. This combines a filter and limit to significantly reduce how much data needs to be transferred from the DB to be combined with other data in-process. Without limit, I exceed latency goals. In this case, the rows that should be returned are dependant on the sort order since the limit statement is restrictive enough that the result otherwise becomes arbitrary. I know this may be somewhat atypical of a use-case. Happy to hear if this can be accommodated or not. When working with remote DBs there are also other pushdown opportunities. For example: pushdown of merges across tables in the same DB. Hence why I was pondering about in the direction of 'execution plan federation'. I've seen similar ideas mentioned in a Substrait talk. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
