gene-bordegaray commented on code in PR #22207:
URL: https://github.com/apache/datafusion/pull/22207#discussion_r3273970550


##########
datafusion/physical-expr/src/partitioning.rs:
##########
@@ -133,13 +136,225 @@ impl Display for Partitioning {
                     .join(", ");
                 write!(f, "Hash([{phy_exprs_str}], {size})")
             }
+            Partitioning::Range(range) => write!(f, "{range}"),
             Partitioning::UnknownPartitioning(size) => {
                 write!(f, "UnknownPartitioning({size})")
             }
         }
     }
 }
 
+/// Physical range partitioning.
+///
+/// [`RangePartitioning`] describes range bounds over one or more physical
+/// expressions. Each [`RangePartition`] represents one output partition and 
must
+/// contain exactly one [`RangeInterval`] for each partition expression.
+///
+/// The source declaring this partitioning is responsible for ensuring that, 
for
+/// every emitted row, the row belongs to exactly one partition and is emitted 
by
+/// that partition. The declared ranges do not need to cover values that the 
plan
+/// cannot emit.
+///
+/// Each lower and upper bound explicitly records whether it is inclusive.
+/// Unbounded sides are represented with `None`, bound values should be 
non-null
+/// until null routing semantics are defined.
+///
+/// For example, a scan can declare date and city range partitions as:
+///
+/// ```text
+/// exprs = [date, city]
+///
+/// partition 0:
+///   date in [2021-01-01, 2022-01-01)
+///   city in [Allston, Boston)
+///
+/// partition 1:
+///   date in [2021-01-01, 2022-01-01)
+///   city in [Boston, NYC)
+/// ```

Review Comment:
   I looked at ClickHouse and InfluxDB, I foudn that they store physical 
partitioning metadata, but did not find anything like a “multi-dimensional 
repartition this row.”
   
   I looked into systems that try to do a true multi-dimensional partitioning 
and there aren't many that really do it. I think fo good reason. It would treat 
the columns like `time` and `city` as independent axes, which in simple cases 
is great and easy but when things start to overlap or more nuanced it seem we 
would need a routing structure like a grid/sparse map/KDB-tree (these were very 
complicated).
   
   The closest thing I found was in Sedona where they do spatial partitioning 
using quadtree and kdbtree:
   - 
https://sedona.apache.org/1.7.1/api/rdocs/reference/sedona_apply_spatial_partitioner.html
   - Quadtree: https://www.geeksforgeeks.org/dsa/quad-tree/
   - KDBTree: https://en.wikipedia.org/wiki/K-D-B-tree
   
   With compound-key range partitioning it is more clear and still efficient on 
repartition routing: 
   ```text
   1. evaluate `(time, city)` as one ordered key
   2. binary-search split points
   3. route to a partition. 
   ```
   
   Compound-key range partitioning should cover most join/planner cases like 
@stuhood mentioned. We are typicaly asking "are the two sides of this join 
compatible" for things like dynamic filters. The thing it lacks compare to true 
multi-dimensional partitioning is independent routing. So, for example, it 
cannot directly represent “time bucket X and city bucket Y map to partition P” 
which is useful when we want to do optimizations on each axis independently 
like pruning on the individual columns:
   
   ```sql
   WHERE time >= '2022' AND time < '2023'
   AND city >= 'Boston' AND city < 'NYC'
   ```
   So I think compound-key range partitioning is the right move. If there is a 
use for this I would say that this should be its own separate implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to