[GitHub] [arrow-datafusion] metesynnada commented on a diff in pull request #7331: Minor: Improve docstrings for `LogicalPlan`

via GitHub Fri, 18 Aug 2023 07:34:52 -0700


metesynnada commented on code in PR #7331:
URL: https://github.com/apache/arrow-datafusion/pull/7331#discussion_r1298525312



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -69,63 +69,84 @@ pub enum LogicalPlan {
     /// expression (essentially a WHERE clause with a predicate
     /// expression).
     ///
-    /// Semantically, `<predicate>` is evaluated for each row of the input;
-    /// If the value of `<predicate>` is true, the input row is passed to
-    /// the output. If the value of `<predicate>` is false, the row is
-    /// discarded.
+    /// Semantically, `<predicate>` is evaluated for each row of the
+    /// input; If the value of `<predicate>` is true, the input row is
+    /// passed to the output. If the value of `<predicate>` is false
+    /// (or null), the row is discarded.
     Filter(Filter),
-    /// Window its input based on a set of window spec and window function 
(e.g. SUM or RANK)
+    /// Windows input based on a set of window spec and window
+    /// function (e.g. SUM or RANK).  This is used to implement SQL
+    /// window functions, and the `OVER` clause.
     Window(Window),
     /// Aggregates its input based on a set of grouping and aggregate
-    /// expressions (e.g. SUM).
+    /// expressions (e.g. SUM). This is used to implement SQL aggregates
+    /// and `GROUP BY`.
     Aggregate(Aggregate),
-    /// Sorts its input according to a list of sort expressions.
+    /// Sorts its input according to a list of sort expressions. This
+    /// is used to implement SQL `ORDER BY`
     Sort(Sort),
-    /// Join two logical plans on one or more join columns
+    /// Join two logical plans on one or more join columns.
+    /// This is used to implement SQL `JOIN`
     Join(Join),
-    /// Apply Cross Join to two logical plans
+    /// Apply Cross Join to two logical plans.
+    /// This is used to implement SQL `CROSS JOIN`
     CrossJoin(CrossJoin),
-    /// Repartition the plan based on a partitioning scheme
+    /// Repartitions the input based on a partitioning scheme. This is
+    /// used to add parallelism and is sometimes referred to as an
+    /// "exchange" operator in other systems
     Repartition(Repartition),
-    /// Union multiple inputs
+    /// Union multiple inputs with the same schema into a single
+    /// output stream. This is used to implement SQL `UNION [ALL]` and
+    /// `INTERSECT [ALL]`.
     Union(Union),
-    /// Produces rows from a table provider by reference or from the context
+    /// Produces rows from a [`TableSource`], used to implement SQL
+    /// `FROM` tables or views.
     TableScan(TableScan),
-    /// Produces no rows: An empty relation with an empty schema
+    /// Produces no rows: An empty relation with an empty schema that
+    /// produces 0 or 1 rows. This is used to implement SQL `SELECT`
+    /// that has no values in the `FROM` clause.
     EmptyRelation(EmptyRelation),
-    /// Subquery
+    /// Produces the output of running another query.  This is used to
+    /// implement SQL subqueries
     Subquery(Subquery),
     /// Aliased relation provides, or changes, the name of a relation.
     SubqueryAlias(SubqueryAlias),
     /// Skip some number of rows, and then fetch some number of rows.
     Limit(Limit),
-    /// [`Statement`]
+    /// A DataFusion [`Statement`] such as `SET VARIABLE` or `START 
TRANSACTION`
     Statement(Statement),
     /// Values expression. See
     /// [Postgres 
VALUES](https://www.postgresql.org/docs/current/queries-values.html)
-    /// documentation for more details.
+    /// documentation for more details. This is used to implement SQL such as
+    /// `VALUES (1, 2), (3, 4)`
     Values(Values),
     /// Produces a relation with string representations of
-    /// various parts of the plan
+    /// various parts of the plan. This is used to implement SQL `EXPLAIN`.
     Explain(Explain),
-    /// Runs the actual plan, and then prints the physical plan with
-    /// with execution metrics.
+    /// Runs the input, and prints annotated physical plan as a string
+    /// with with execution metric. This is used to implement SQL
+    /// `EXPLAIN ANALYZE`.
     Analyze(Analyze),
-    /// Extension operator defined outside of DataFusion
+    /// Extension operator defined outside of DataFusion. This is used
+    /// to extend DataFusion with custom relational operations that
     Extension(Extension),
-    /// Remove duplicate rows from the input
+    /// Remove duplicate rows from the input. This is used to
+    /// implement SQL `SELECT DISTINCT ...`.
     Distinct(Distinct),
-    /// Prepare a statement
+    /// Prepare a statement and find any bind parameters
+    /// (e.g. `?`). This is used to implement SQL prepared statements.
     Prepare(Prepare),
-    /// Insert / Update / Delete
+    /// Data Manipulaton Language (DML): Insert / Update / Delete
     Dml(DmlStatement),
-    /// CREATE / DROP TABLES / VIEWS / SCHEMAs
+    /// Data Definition Language (DDL): CREATE / DROP TABLES / VIEWS / SCHEMAs
     Ddl(DdlStatement),
-    /// COPY TO
+    /// `COPY TO` for writing plan results to files
     Copy(CopyTo),
-    /// Describe the schema of table
+    /// Describe the schema of table. This is used to implement the
+    /// SQL `DESCRIBE` command from MySQL.

Review Comment:
   ```suggestion
       /// Describe the schema of the table. This is used to implement the
       /// SQL `DESCRIBE` command from MySQL.
   ```



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -69,63 +69,84 @@ pub enum LogicalPlan {
     /// expression (essentially a WHERE clause with a predicate
     /// expression).
     ///
-    /// Semantically, `<predicate>` is evaluated for each row of the input;
-    /// If the value of `<predicate>` is true, the input row is passed to
-    /// the output. If the value of `<predicate>` is false, the row is
-    /// discarded.
+    /// Semantically, `<predicate>` is evaluated for each row of the
+    /// input; If the value of `<predicate>` is true, the input row is
+    /// passed to the output. If the value of `<predicate>` is false
+    /// (or null), the row is discarded.
     Filter(Filter),
-    /// Window its input based on a set of window spec and window function 
(e.g. SUM or RANK)
+    /// Windows input based on a set of window spec and window
+    /// function (e.g. SUM or RANK).  This is used to implement SQL
+    /// window functions, and the `OVER` clause.
     Window(Window),
     /// Aggregates its input based on a set of grouping and aggregate
-    /// expressions (e.g. SUM).
+    /// expressions (e.g. SUM). This is used to implement SQL aggregates
+    /// and `GROUP BY`.
     Aggregate(Aggregate),
-    /// Sorts its input according to a list of sort expressions.
+    /// Sorts its input according to a list of sort expressions. This
+    /// is used to implement SQL `ORDER BY`
     Sort(Sort),
-    /// Join two logical plans on one or more join columns
+    /// Join two logical plans on one or more join columns.
+    /// This is used to implement SQL `JOIN`
     Join(Join),
-    /// Apply Cross Join to two logical plans
+    /// Apply Cross Join to two logical plans.
+    /// This is used to implement SQL `CROSS JOIN`
     CrossJoin(CrossJoin),
-    /// Repartition the plan based on a partitioning scheme
+    /// Repartitions the input based on a partitioning scheme. This is
+    /// used to add parallelism and is sometimes referred to as an
+    /// "exchange" operator in other systems
     Repartition(Repartition),
-    /// Union multiple inputs
+    /// Union multiple inputs with the same schema into a single
+    /// output stream. This is used to implement SQL `UNION [ALL]` and
+    /// `INTERSECT [ALL]`.
     Union(Union),
-    /// Produces rows from a table provider by reference or from the context
+    /// Produces rows from a [`TableSource`], used to implement SQL
+    /// `FROM` tables or views.
     TableScan(TableScan),
-    /// Produces no rows: An empty relation with an empty schema
+    /// Produces no rows: An empty relation with an empty schema that
+    /// produces 0 or 1 rows. This is used to implement SQL `SELECT`
+    /// that has no values in the `FROM` clause.
     EmptyRelation(EmptyRelation),
-    /// Subquery
+    /// Produces the output of running another query.  This is used to
+    /// implement SQL subqueries
     Subquery(Subquery),
     /// Aliased relation provides, or changes, the name of a relation.
     SubqueryAlias(SubqueryAlias),
     /// Skip some number of rows, and then fetch some number of rows.
     Limit(Limit),
-    /// [`Statement`]
+    /// A DataFusion [`Statement`] such as `SET VARIABLE` or `START 
TRANSACTION`
     Statement(Statement),
     /// Values expression. See
     /// [Postgres 
VALUES](https://www.postgresql.org/docs/current/queries-values.html)
-    /// documentation for more details.
+    /// documentation for more details. This is used to implement SQL such as
+    /// `VALUES (1, 2), (3, 4)`
     Values(Values),
     /// Produces a relation with string representations of
-    /// various parts of the plan
+    /// various parts of the plan. This is used to implement SQL `EXPLAIN`.
     Explain(Explain),
-    /// Runs the actual plan, and then prints the physical plan with
-    /// with execution metrics.
+    /// Runs the input, and prints annotated physical plan as a string
+    /// with with execution metric. This is used to implement SQL
+    /// `EXPLAIN ANALYZE`.

Review Comment:
   ```suggestion
       /// Runs the input, and prints annotated physical plan as a string
       /// with execution metric. This is used to implement SQL
       /// `EXPLAIN ANALYZE`.
   ```



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -69,63 +69,84 @@ pub enum LogicalPlan {
     /// expression (essentially a WHERE clause with a predicate
     /// expression).
     ///
-    /// Semantically, `<predicate>` is evaluated for each row of the input;
-    /// If the value of `<predicate>` is true, the input row is passed to
-    /// the output. If the value of `<predicate>` is false, the row is
-    /// discarded.
+    /// Semantically, `<predicate>` is evaluated for each row of the
+    /// input; If the value of `<predicate>` is true, the input row is
+    /// passed to the output. If the value of `<predicate>` is false
+    /// (or null), the row is discarded.
     Filter(Filter),
-    /// Window its input based on a set of window spec and window function 
(e.g. SUM or RANK)
+    /// Windows input based on a set of window spec and window
+    /// function (e.g. SUM or RANK).  This is used to implement SQL
+    /// window functions, and the `OVER` clause.
     Window(Window),
     /// Aggregates its input based on a set of grouping and aggregate
-    /// expressions (e.g. SUM).
+    /// expressions (e.g. SUM). This is used to implement SQL aggregates
+    /// and `GROUP BY`.
     Aggregate(Aggregate),
-    /// Sorts its input according to a list of sort expressions.
+    /// Sorts its input according to a list of sort expressions. This
+    /// is used to implement SQL `ORDER BY`
     Sort(Sort),
-    /// Join two logical plans on one or more join columns
+    /// Join two logical plans on one or more join columns.
+    /// This is used to implement SQL `JOIN`
     Join(Join),
-    /// Apply Cross Join to two logical plans
+    /// Apply Cross Join to two logical plans.
+    /// This is used to implement SQL `CROSS JOIN`
     CrossJoin(CrossJoin),
-    /// Repartition the plan based on a partitioning scheme
+    /// Repartitions the input based on a partitioning scheme. This is
+    /// used to add parallelism and is sometimes referred to as an
+    /// "exchange" operator in other systems
     Repartition(Repartition),
-    /// Union multiple inputs
+    /// Union multiple inputs with the same schema into a single
+    /// output stream. This is used to implement SQL `UNION [ALL]` and
+    /// `INTERSECT [ALL]`.
     Union(Union),
-    /// Produces rows from a table provider by reference or from the context
+    /// Produces rows from a [`TableSource`], used to implement SQL
+    /// `FROM` tables or views.
     TableScan(TableScan),
-    /// Produces no rows: An empty relation with an empty schema
+    /// Produces no rows: An empty relation with an empty schema that
+    /// produces 0 or 1 rows. This is used to implement SQL `SELECT`
+    /// that has no values in the `FROM` clause.
     EmptyRelation(EmptyRelation),
-    /// Subquery
+    /// Produces the output of running another query.  This is used to
+    /// implement SQL subqueries
     Subquery(Subquery),
     /// Aliased relation provides, or changes, the name of a relation.
     SubqueryAlias(SubqueryAlias),
     /// Skip some number of rows, and then fetch some number of rows.
     Limit(Limit),
-    /// [`Statement`]
+    /// A DataFusion [`Statement`] such as `SET VARIABLE` or `START 
TRANSACTION`
     Statement(Statement),
     /// Values expression. See
     /// [Postgres 
VALUES](https://www.postgresql.org/docs/current/queries-values.html)
-    /// documentation for more details.
+    /// documentation for more details. This is used to implement SQL such as
+    /// `VALUES (1, 2), (3, 4)`
     Values(Values),
     /// Produces a relation with string representations of
-    /// various parts of the plan
+    /// various parts of the plan. This is used to implement SQL `EXPLAIN`.
     Explain(Explain),
-    /// Runs the actual plan, and then prints the physical plan with
-    /// with execution metrics.
+    /// Runs the input, and prints annotated physical plan as a string
+    /// with with execution metric. This is used to implement SQL
+    /// `EXPLAIN ANALYZE`.
     Analyze(Analyze),
-    /// Extension operator defined outside of DataFusion
+    /// Extension operator defined outside of DataFusion. This is used
+    /// to extend DataFusion with custom relational operations that
     Extension(Extension),
-    /// Remove duplicate rows from the input
+    /// Remove duplicate rows from the input. This is used to
+    /// implement SQL `SELECT DISTINCT ...`.
     Distinct(Distinct),
-    /// Prepare a statement
+    /// Prepare a statement and find any bind parameters
+    /// (e.g. `?`). This is used to implement SQL prepared statements.

Review Comment:
   ```suggestion
       /// Prepare a statement and find any bind parameters
       /// (e.g. `?`). This is used to implement SQL-prepared statements.
   ```



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -69,63 +69,84 @@ pub enum LogicalPlan {
     /// expression (essentially a WHERE clause with a predicate
     /// expression).
     ///
-    /// Semantically, `<predicate>` is evaluated for each row of the input;
-    /// If the value of `<predicate>` is true, the input row is passed to
-    /// the output. If the value of `<predicate>` is false, the row is
-    /// discarded.
+    /// Semantically, `<predicate>` is evaluated for each row of the
+    /// input; If the value of `<predicate>` is true, the input row is
+    /// passed to the output. If the value of `<predicate>` is false
+    /// (or null), the row is discarded.
     Filter(Filter),
-    /// Window its input based on a set of window spec and window function 
(e.g. SUM or RANK)
+    /// Windows input based on a set of window spec and window
+    /// function (e.g. SUM or RANK).  This is used to implement SQL
+    /// window functions, and the `OVER` clause.
     Window(Window),
     /// Aggregates its input based on a set of grouping and aggregate
-    /// expressions (e.g. SUM).
+    /// expressions (e.g. SUM). This is used to implement SQL aggregates
+    /// and `GROUP BY`.
     Aggregate(Aggregate),
-    /// Sorts its input according to a list of sort expressions.
+    /// Sorts its input according to a list of sort expressions. This
+    /// is used to implement SQL `ORDER BY`
     Sort(Sort),
-    /// Join two logical plans on one or more join columns
+    /// Join two logical plans on one or more join columns.
+    /// This is used to implement SQL `JOIN`
     Join(Join),
-    /// Apply Cross Join to two logical plans
+    /// Apply Cross Join to two logical plans.
+    /// This is used to implement SQL `CROSS JOIN`
     CrossJoin(CrossJoin),
-    /// Repartition the plan based on a partitioning scheme
+    /// Repartitions the input based on a partitioning scheme. This is
+    /// used to add parallelism and is sometimes referred to as an
+    /// "exchange" operator in other systems
     Repartition(Repartition),
-    /// Union multiple inputs
+    /// Union multiple inputs with the same schema into a single
+    /// output stream. This is used to implement SQL `UNION [ALL]` and
+    /// `INTERSECT [ALL]`.
     Union(Union),
-    /// Produces rows from a table provider by reference or from the context
+    /// Produces rows from a [`TableSource`], used to implement SQL
+    /// `FROM` tables or views.
     TableScan(TableScan),
-    /// Produces no rows: An empty relation with an empty schema
+    /// Produces no rows: An empty relation with an empty schema that
+    /// produces 0 or 1 rows. This is used to implement SQL `SELECT`
+    /// that has no values in the `FROM` clause.

Review Comment:
   ```suggestion
       /// Produces no rows: An empty relation with an empty schema that
       /// produces 0 or 1 row. This is used to implement SQL `SELECT`
       /// that has no values in the `FROM` clause.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] metesynnada commented on a diff in pull request #7331: Minor: Improve docstrings for `LogicalPlan`

Reply via email to