theirix commented on PR #16505: URL: https://github.com/apache/datafusion/pull/16505#issuecomment-3003701483
@2010YOUY01 thank you for pointing this out. @chenkovsky, it looks like both our PRs solve the same sampling problem from different approaches. The direction of my PR is to continue improving random filtering (as in #13268) by enhancing a predicate-based sampling, as previously discussed with @alamb [here](https://github.com/apache/datafusion/issues/13563#issuecomment-2498989436). The sampling logic differs between databases, and in my PR implementation and review process, we have already begun addressing some subtle semantics differences for Postgres, DuckDB, Hive etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org