gene-bordegaray commented on code in PR #22207:
URL: https://github.com/apache/datafusion/pull/22207#discussion_r3275952933
##########
datafusion/physical-expr/src/partitioning.rs:
##########
@@ -133,13 +137,176 @@ impl Display for Partitioning {
.join(", ");
write!(f, "Hash([{phy_exprs_str}], {size})")
}
+ Partitioning::Range(range) => write!(f, "{range}"),
Partitioning::UnknownPartitioning(size) => {
write!(f, "UnknownPartitioning({size})")
}
}
}
}
+/// Physical range partitioning.
+///
+/// [`RangePartitioning`] describes an ordered key space with split points.
+///
+/// - `sort_exprs` define the partitioning key and ordering.
+/// - `split_points` define the boundaries between adjacent partitions. Each
+/// split point is a tuple with one [`ScalarValue`] per sort expression.
+/// - The declaring source must ensure every emitted row belongs to exactly one
+/// declared partition and is emitted by that partition.
+///
+/// The sort expressions must be non-empty, and split points must be strictly
+/// ordered according to those sort expressions.
+///
+/// For a single range key:
+///
+/// ```text
+/// sort_exprs = [date ASC NULLS LAST]
+/// split_points = [
+/// (2022-01-01),
+/// (2023-01-01),
+/// ]
+///
+/// partition 0: date before 2022-01-01
+/// partition 1: date between 2022-01-01 and 2023-01-01
+/// partition 2: date at/after 2023-01-01
+/// ```
+///
+/// The same model extends to compound keys.
+/// For `sort_exprs = [time ASC, city ASC]`, split points are ordered
+/// lexicographically by `(time, city)`:
+///
+/// ```text
+/// sort_exprs = [time ASC NULLS LAST, city ASC NULLS LAST]
+/// split_points = [
+/// (2022, Allston),
+/// (2023, Allston),
+/// ]
+///
+/// partition 0: keys before (2022, Allston)
+/// partition 1: keys between (2022, Allston) and (2023, Allston)
+/// partition 2: keys at/after (2023, Allston)
+/// ```
+///
+/// NOTE: Optimizer and execution behavior for this partitioning is
intentionally
+/// not implemented and will be introduced incrementally.
+#[derive(Debug, Clone)]
+pub struct RangePartitioning {
+ /// Ordered partitioning key. Sort options are part of the partitioning
+ /// because `ASC`/`DESC` and null ordering decide which side of a split
point
+ /// a row belongs to.
+ sort_exprs: Vec<PhysicalSortExpr>,
+ /// Boundaries between adjacent partitions. `N` split points define `N + 1`
+ /// lower-inclusive, upper-exclusive partitions. Values equal to a split
+ /// point belong to the partition after that split point.
+ split_points: Vec<Vec<ScalarValue>>,
Review Comment:
yes, good point thank you
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]