theirix opened a new pull request, #17633: URL: https://github.com/apache/datafusion/pull/17633
## Which issue does this PR close? - Closes #13563. ## Rationale for this change The rationale is explained in https://github.com/apache/datafusion/issues/13563 in detail with known syntax examples. This is the third design for the table sample support. 1. My first design was an addition of an explicit rewrite function baked in into a select logical plan - #16325 2. Second design introduced dedicated flexible logical and physical plans, but tied to datafusion core - #16505 3. This third design abstracts the second design out of datafusion core into extensions. ## What changes are included in this PR? All changes are bundled to an example file since it is a PoC of extensibility - as discussed with @alamb in https://github.com/apache/datafusion/issues/13563#issuecomment-3201702314 . If the idea is viable, the code could be modularised, which is not possible in a `datafusion-examples` crate. It adds several components: - a custom `TableSamplePlanNode` trait for a sampling logical plan - a query planner `TableSampleQueryPlanner` (trait `QueryPlanner`) - an execution plan `SampleExec` - mostly adapted from the second design, all kudos and thanks to @chenkovsky ! - an extension planner (trait `ExtensionPlanner`) to build a physical plan - tests - an example runner The setup, as seen in main, is a bit cumbersome, but it works. Building a SQL extension with access to AST and without introducing a whole new statement (as large projects like arroyo, greptime or cube introduce new syntax) is complicated. For full modularity I would propose a few extension points to `SqlToRel`. It could help to avoid manual parsing / logical plan / physical plan / execution conversions and would keep the concise client syntax. ## Are these changes tested? 1. A set of unit tests 2. An example with asserts ## Are there any user-facing changes? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
