Tpt opened a new pull request, #18254: URL: https://github.com/apache/datafusion/pull/18254
Rely on aggregate GroupValues abstraction to build a hash table of the emitted rows that is used to deduplicate We might make things a bit more efficient by rewriting a hash table wrapper just for deduplication, but this implementation should give a fair baseline ## Which issue does this PR close? - Closes #18140. ## Rationale for this change Implements deduplicating recursive CTE (i.e. `UNION` inside of `WITH RECURSIVE`) using a hash table. I reuse the one from aggregates to avoid rebuilding a full wrapper and specialization for types. Each time a batch is returned by the static or the recursive terms of the CTE, the hash table is used to remove already seen rows before emitting the rows and keeping them in memory for the next recursion step. ## What changes are included in this PR? ## Are these changes tested? Yes, some sqllogictests have been added ## Are there any user-facing changes? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
