GitHub user rickbatka created a discussion: Logical plan best practices: should I do pre-work at plan time, or Exec time?
I'm building a TableProvider that represents a custom external data source that stores data in partitions in a consistent hash ring. The partitions are trivial to discern from the filters provided. I'm trying to implement filter pushdown and I can't seem to find good "best practices" guidance on how much work should be done in the logical versus physical stages. 1. Should I determine which shards I need to scan in my logical plan and pass it into the new physical plan I create in the constructor? Or should the logical plan just hand off the raw filters and leave it to the physical plan to sort out? 2. Should I represent each shard scan as a separate Exec node in the physical plan? For example, a logical plan could determine it needs to talk to shards 1, 3, and 5 and therefore create 3 Exec nodes - for each individual scan from a shard. Or should I treat my sharding as a black box, and just put all this logic into the single Exec node to be determined at runtime? 3. In general, how much "pre-work" should I do in the logical plan? As much as possible? As little as possible? Any links to readings or presentations on this subject would be appreciated. Thanks in advance! GitHub link: https://github.com/apache/datafusion/discussions/18156 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
