alamb commented on code in PR #8022:
URL: https://github.com/apache/arrow-datafusion/pull/8022#discussion_r1379866981
##########
datafusion/physical-expr/src/partitioning.rs:
##########
@@ -15,14 +15,43 @@
// specific language governing permissions and limitations
// under the License.
-//! [`Partitioning`] and [`Distribution`] for physical expressions
+//! [`Partitioning`] and [`Distribution`] for `ExecutionPlans`
use std::fmt;
use std::sync::Arc;
use crate::{expr_list_eq_strict_order, EquivalenceProperties, PhysicalExpr};
-/// Partitioning schemes supported by operators.
+/// Partitioning schemes supported by [`ExecutionPlan`]s.
+///
+/// A partition represents an independent stream that an `ExecutionPlan` can
+/// produce in parallel. Each `ExecutionPlan` must produce at least one
+/// partition, and the number of partitions varies based on the input and the
+/// operation performed.
+///
+/// ```text
+/// ▲ ▲ ▲
+/// │ │ │
+/// │ │ │ An ExecutionPlan with 3
+/// │ │ │ output partitions will
+/// ┌───┴──────┴──────┴──┐ produce 3 streams of
+/// │ ExecutionPlan │ RecordBatches that run in
+/// └────────────────────┘ parallel.
Review Comment:
good idea -- I will do
##########
datafusion/physical-expr/src/partitioning.rs:
##########
@@ -15,14 +15,43 @@
// specific language governing permissions and limitations
// under the License.
-//! [`Partitioning`] and [`Distribution`] for physical expressions
+//! [`Partitioning`] and [`Distribution`] for `ExecutionPlans`
use std::fmt;
use std::sync::Arc;
use crate::{expr_list_eq_strict_order, EquivalenceProperties, PhysicalExpr};
-/// Partitioning schemes supported by operators.
+/// Partitioning schemes supported by [`ExecutionPlan`]s.
+///
+/// A partition represents an independent stream that an `ExecutionPlan` can
+/// produce in parallel. Each `ExecutionPlan` must produce at least one
+/// partition, and the number of partitions varies based on the input and the
+/// operation performed.
+///
+/// ```text
+/// ▲ ▲ ▲
+/// │ │ │
+/// │ │ │ An ExecutionPlan with 3
+/// │ │ │ output partitions will
+/// ┌───┴──────┴──────┴──┐ produce 3 streams of
+/// │ ExecutionPlan │ RecordBatches that run in
+/// └────────────────────┘ parallel.
+/// ```
+///
+/// # Examples
+///
+/// A simple `FileScanExec` might produce one output stream (partition) for
each
+/// file (note the actual DataFusion file scaners can read individual files in
+/// parallel, potentially producing multiple partitions per file)
+///
+/// Plans such as `SortPreservingMerge` produce a single output stream
+/// (1 output partition) by combining some number of input streams (input
partitions)
+///
+/// Plans such as `FilterExec` produce the same number of output streams
+/// (partitions) as input streams (partitions).
+///
+/// [`ExecutionPlan`]: crate::physical_plan::ExecutionPlan
Review Comment:
I would probably phrase it differently, like "the result of executing a
Partition is a async stream (a kind of future)" or something. I'll try and
clarify
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]