stuhood commented on issue #22395:
URL: https://github.com/apache/datafusion/issues/22395#issuecomment-4618822694

   @gene-bordegaray : 
   1. today's discussion (thanks!)
   2. the design doc that I've been working on
   3. reviewing #22657
   
   ...have highlighted one of the tradeoffs of the current definition of Range 
partitioning that I wanted to call out:
   
   ----
   
   As defined, Partitioning::Range contains no overlapping partitions. This is 
aligned with Partitioning::Hash (I can't really imagine what overlap would mean 
there), but allowing for overlapping _output_ partitioning from a 
`TableProvider`, while continuing to require that _input_ partitioning is total 
and non-overlapping (is this a `enum Distribution`...?) would have some really 
interesting properties.
   
   In order to produce a reasonable number of Range partitions _which match the 
other side of the join_, the underlying data will either:
   1. `DISJOINT` - already need to be over-partitioned into a series of 
disjoint ranges, in which case the creator of the `TableProvider` would 
merge/aggregate a few of the underlying files/partitions into the declared 
Ranges (e.g. go from 1000 underlying partitions to 16 declared 
Partitioning::Range partitions)
   2. `NOT-DISJOINT` - if the underlying files/partitions are _not_-disjoint / 
are-overlapping, then the `TableProvider` will need to:
       1. dynamically choose partition boundaries (using statistics about the 
amount of overlap, size, etc)
       2. bucket the files into those partitions 
       3. push down or execute range filters for cases where a file spans a 
partition boundary (to guarantee that data is emitted in exactly one partition)
   
   ----
   
   All of this is fine, and we're prepared to do it in our implementation. But 
if `Partitioning::Range` supported overlapping output partitions, things would 
look a little bit different:
   
   In that case instead, a wrapper (?) around the `TableProvider` might 
implement the steps from the `NOT-DISJOINT` case above: it would take the 
underlying overlapped partitions, consume their statistics (generically), and 
push down static per-file filters based on the chosen boundaries, and then 
coalesce their outputs.
   
   The benefit of this would be one upstream implementation of this range 
filtering... and possibly also some amount of sharing of the coordination 
required _between_ the two sides of a join, which we have been planning to 
implement with an optimizer rule inserted at the appropriate spot to inspect 
both sides of the join, and align their Partitioning::Range declaration based 
on their shared statistics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to