alamb commented on code in PR #19893:
URL: https://github.com/apache/datafusion/pull/19893#discussion_r2722995668
##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -544,11 +544,11 @@ pub struct AggregateExec {
/// Aggregation mode (full, partial)
mode: AggregateMode,
/// Group by expressions
- group_by: PhysicalGroupBy,
+ group_by: Arc<PhysicalGroupBy>,
Review Comment:
likewise here I think it would be good to note they are Arc to make
clone/plan rewriting faster
##########
datafusion/physical-expr/src/projection.rs:
##########
@@ -125,7 +125,7 @@ impl From<ProjectionExpr> for (Arc<dyn PhysicalExpr>,
String) {
/// indices.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct ProjectionExprs {
- exprs: Vec<ProjectionExpr>,
+ exprs: Arc<[ProjectionExpr]>,
Review Comment:
I think it is worth commenting here the rationale for using Arc<[...]>,
namely that it makes this structure inexpensive to copy as happens during
PhysicalPlanning
##########
datafusion/physical-plan/src/projection.rs:
##########
@@ -498,6 +506,146 @@ impl RecordBatchStream for ProjectionStream {
}
}
+/// Describes an option immutable reference counted shared projection.
+#[derive(Debug, Clone, Default, PartialEq, Eq)]
+pub struct OptionProjectionRef {
+ inner: Option<Arc<[usize]>>,
+}
+
+impl OptionProjectionRef {
+ /// Make a new [`OptionProjectionRef`].
+ pub fn new(inner: Option<impl Into<Arc<[usize]>>>) -> Self {
+ Self {
+ inner: inner.map(Into::into),
+ }
+ }
+
+ /// Project inner.
+ pub fn as_inner(&self) -> &Option<Arc<[usize]>> {
+ &self.inner
+ }
+
+ /// Consume self and return inner.
+ pub fn into_inner(self) -> Option<Arc<[usize]>> {
+ self.inner
+ }
+
+ /// Represent this projection as option slice.
+ pub fn as_ref(&self) -> Option<&[usize]> {
+ self.inner.as_deref()
+ }
+
+ /// Check if the projection is set.
+ pub fn is_some(&self) -> bool {
+ self.inner.is_some()
+ }
+
+ /// Check if the projection is not set.
+ pub fn is_none(&self) -> bool {
+ self.inner.is_none()
+ }
+
+ /// Apply passed `projection` to inner one.
+ ///
+ /// If inner projection is [`None`] then there are no changes.
+ /// Otherwise, if passed `projection` is not [`None`] then it is remapped
+ /// according to the stored one. Otherwise, there are no changes.
+ ///
+ /// # Example
+ ///
+ /// If stored projection is [0, 2] and we call `apply_projection([0, 2,
3])`,
+ /// then the resulting projection will be [0, 3].
+ ///
+ /// # Error
+ ///
+ /// Returns an internal error if existing projection contains index that is
+ /// greater than len of the passed `projection`.
+ ///
+ pub fn apply_projection<'a>(
Review Comment:
I see this is basically refactored out of NestedLoops join (and maybe
elsewhere)
##########
datafusion/physical-plan/src/projection.rs:
##########
@@ -498,6 +506,146 @@ impl RecordBatchStream for ProjectionStream {
}
}
+/// Describes an option immutable reference counted shared projection.
Review Comment:
I see why you need something like this (as it changes the signture of
TableProvider) but this seems inconsistent with the other representations of
projections in DataFusion/Arrow I am familiar with, which are represented with
an `Option` rather than having the Option internally
I recommend:
1. Move this into `datafusion/physical-expr/src/projection.rs` (not in
physical plan) so it is near `ProjectionExprs`
2. Move the Option outside (so the signature is `Option<ProjectionRef>`
rather than `OptionProjectionRef`)
##########
datafusion/common/src/utils/mod.rs:
##########
@@ -70,9 +70,9 @@ use std::thread::available_parallelism;
/// ```
pub fn project_schema(
schema: &SchemaRef,
- projection: Option<&Vec<usize>>,
+ projection: Option<&impl AsRef<[usize]>>,
Review Comment:
Per @crepererum 's suggestion, I tried out what it would look like and came
up with
- https://github.com/askalt/datafusion/pull/3
It does seem to be reasonable
The original signature is `Option<&Vec<..>>` I think to align with
TableProvider::scan (which also shouldn't have a owned Vec, but that I think is
a historic accident)
##########
datafusion/physical-plan/src/projection.rs:
##########
@@ -498,6 +506,146 @@ impl RecordBatchStream for ProjectionStream {
}
}
+/// Describes an option immutable reference counted shared projection.
Review Comment:
I also recommend documenting *why* this structure exists (because it is
cheap to `clone`)
Something like
```rust
/// This structure represents projecting a set of columns by index.
/// It uses an `Arc` internally to make it cheap to clone
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]