This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/main by this push:
     new 3c87b23655 Improve datafuson-proto and datafusion-substrait docs 
(#9260)
3c87b23655 is described below

commit 3c87b236553ffc1cff84dee34cb5303c60be2afc
Author: Andrew Lamb <[email protected]>
AuthorDate: Tue Feb 20 20:04:10 2024 -0500

    Improve datafuson-proto and datafusion-substrait docs (#9260)
    
    * Clean up documentation for datafusion-substrait and datafusion-proto
    
    * Port datafusion-proto examples to rustdocs
    
    * Cleanup substrait docs and examples
    
    * prettier
    
    * Apply suggestions from code review
    
    Co-authored-by: Jeffrey Vo <[email protected]>
    Co-authored-by: Ruihang Xia <[email protected]>
    
    * Fix readme links
    
    * Update protoc docs
    
    * Add compatibilty note
    
    * add plan serialization use
    
    * prettier
    
    ---------
    
    Co-authored-by: Jeffrey Vo <[email protected]>
    Co-authored-by: Ruihang Xia <[email protected]>
---
 .../{substrait/README.md => proto/README-dev.md}   |  20 ++--
 datafusion/proto/README.md                         | 122 ++-------------------
 datafusion/proto/examples/expr_serde.rs            |  33 ------
 datafusion/proto/examples/logical_plan_serde.rs    |  32 ------
 datafusion/proto/examples/physical_plan_serde.rs   |  36 ------
 datafusion/proto/src/lib.rs                        |  98 ++++++++++++++++-
 datafusion/substrait/README.md                     |  18 +--
 datafusion/substrait/src/lib.rs                    |  57 ++++++++++
 8 files changed, 174 insertions(+), 242 deletions(-)

diff --git a/datafusion/substrait/README.md b/datafusion/proto/README-dev.md
similarity index 50%
copy from datafusion/substrait/README.md
copy to datafusion/proto/README-dev.md
index 17591cdd62..b793b47e76 100644
--- a/datafusion/substrait/README.md
+++ b/datafusion/proto/README-dev.md
@@ -17,18 +17,16 @@
   under the License.
 -->
 
-# DataFusion + Substrait
+## Generated Code
 
-[Substrait](https://substrait.io/) provides a cross-language serialization 
format for relational algebra, based on
-protocol buffers.
+The prost/tonic code can be generated by running `./regen.sh`, which in turn 
invokes the Rust binary located in [gen](./gen)
 
-This repository provides a Substrait producer and consumer for DataFusion:
+This is necessary after modifying the protobuf definitions or altering the 
dependencies of [gen](./gen), and requires a
+valid installation of [protoc] (see [installation instructions] for details).
 
-- The producer converts a DataFusion logical plan into a Substrait protobuf.
-- The consumer converts a Substrait protobuf into a DataFusion logical plan.
+```bash
+./regen.sh
+```
 
-Potential uses of this crate:
-
-- Replace the current [DataFusion protobuf 
definition](https://github.com/apache/arrow-datafusion/blob/main/datafusion/proto/proto/datafusion.proto)
 used in Ballista for passing query plan fragments to executors
-- Make it easier to pass query plans over FFI boundaries, such as from Python 
to Rust
-- Allow Apache Calcite query plans to be executed in DataFusion
+[protoc]: 
https://github.com/protocolbuffers/protobuf#protocol-compiler-installation
+[installation instructions]: 
https://arrow.apache.org/datafusion/contributor-guide/#protoc-installation
diff --git a/datafusion/proto/README.md b/datafusion/proto/README.md
index 8d25f193fa..ca6ae7fc68 100644
--- a/datafusion/proto/README.md
+++ b/datafusion/proto/README.md
@@ -17,121 +17,13 @@
   under the License.
 -->
 
-# Apache Arrow DataFusion Proto
+# `datafusion-proto`: Apache Arrow DataFusion Protobuf Serialization / 
Deserialization
 
-Apache Arrow [DataFusion][df] is an extensible query execution framework,
-written in Rust, that uses Apache Arrow as its in-memory format.
+This crate contains code to convert Apache Arrow [DataFusion] plans to and from
+bytes, which can be useful for sending plans over the network, for example
+when building a distributed query engine.
 
-This crate provides support format for serializing and deserializing the
-following structures to and from bytes:
+See [API Docs] for details and examples.
 
-1. [`LogicalPlan`]'s (including [`Expr`]),
-2. [`ExecutionPlan`]s (including [`PhysiscalExpr`])
-
-This format can be useful for sending plans over the network, for example when
-building a distributed query engine.
-
-Internally, this crate is implemented by converting the plans to [protocol
-buffers] using [prost].
-
-[protocol buffers]: https://developers.google.com/protocol-buffers
-[`logicalplan`]: 
https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html
-[`expr`]: 
https://docs.rs/datafusion/latest/datafusion/logical_expr/expr/enum.Expr.html
-[`executionplan`]: 
https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html
-[`physiscalexpr`]: 
https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html
-[prost]: https://docs.rs/prost/latest/prost/
-
-## See Also
-
-The binary format created by this crate supports the full range of DataFusion
-plans, but is DataFusion specific. See [datafusion-substrait] which can encode
-many DataFusion plans using the [substrait.io] standard.
-
-[datafusion-substrait]: 
https://docs.rs/datafusion-substrait/latest/datafusion_substrait
-[substrait.io]: https://substrait.io
-
-# Examples
-
-## Serializing Expressions
-
-Based on [examples/expr_serde.rs](examples/expr_serde.rs)
-
-```rust
-use datafusion_common::Result;
-use datafusion_expr::{col, lit, Expr};
-use datafusion_proto::bytes::Serializeable;
-
-fn main() -> Result<()> {
-    // Create a new `Expr` a < 32
-    let expr = col("a").lt(lit(5i32));
-
-    // Convert it to an opaque form
-    let bytes = expr.to_bytes()?;
-
-    // Decode bytes from somewhere (over network, etc.)
-    let decoded_expr = Expr::from_bytes(&bytes)?;
-    assert_eq!(expr, decoded_expr);
-    Ok(())
-}
-```
-
-## Serializing Logical Plans
-
-Based on [examples/logical_plan_serde.rs](examples/logical_plan_serde.rs)
-
-```rust
-use datafusion::prelude::*;
-use datafusion_common::Result;
-use datafusion_proto::bytes::{logical_plan_from_bytes, logical_plan_to_bytes};
-
-#[tokio::main]
-async fn main() -> Result<()> {
-    let ctx = SessionContext::new();
-    ctx.register_csv("t1", "tests/testdata/test.csv", 
CsvReadOptions::default())
-        .await
-        ?;
-    let plan = ctx.table("t1").await?.into_optimized_plan()?;
-    let bytes = logical_plan_to_bytes(&plan)?;
-    let logical_round_trip = logical_plan_from_bytes(&bytes, &ctx)?;
-    assert_eq!(format!("{:?}", plan), format!("{:?}", logical_round_trip));
-    Ok(())
-}
-```
-
-## Serializing Physical Plans
-
-Based on [examples/physical_plan_serde.rs](examples/physical_plan_serde.rs)
-
-```rust
-use datafusion::prelude::*;
-use datafusion_common::Result;
-use datafusion_proto::bytes::{physical_plan_from_bytes,physical_plan_to_bytes};
-
-#[tokio::main]
-async fn main() -> Result<()> {
-    let ctx = SessionContext::new();
-    ctx.register_csv("t1", "tests/testdata/test.csv", 
CsvReadOptions::default())
-        .await
-        ?;
-    let logical_plan = ctx.table("t1").await?.into_optimized_plan()?;
-    let physical_plan = ctx.create_physical_plan(&logical_plan).await?;
-    let bytes = physical_plan_to_bytes(physical_plan.clone())?;
-    let physical_round_trip = physical_plan_from_bytes(&bytes, &ctx)?;
-    assert_eq!(format!("{:?}", physical_plan), format!("{:?}", 
physical_round_trip));
-    Ok(())
-}
-
-```
-
-## Generated Code
-
-The prost/tonic code can be generated by running, which in turn invokes the 
Rust binary located in [gen](./gen)
-
-This is necessary after modifying the protobuf definitions or altering the 
dependencies of [gen](./gen), and requires a
-valid installation of 
[protoc](https://github.com/protocolbuffers/protobuf#protocol-compiler-installation).
-
-```bash
-./regen.sh
-```
-
-[df]: https://crates.io/crates/datafusion
+[datafusion]: https://arrow.apache.org/datafusion
+[api docs]: http://docs.rs/datafusion-proto/latest
diff --git a/datafusion/proto/examples/expr_serde.rs 
b/datafusion/proto/examples/expr_serde.rs
deleted file mode 100644
index 9da64f87e2..0000000000
--- a/datafusion/proto/examples/expr_serde.rs
+++ /dev/null
@@ -1,33 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-use datafusion_common::Result;
-use datafusion_expr::{col, lit, Expr};
-use datafusion_proto::bytes::Serializeable;
-
-fn main() -> Result<()> {
-    // Create a new `Expr` a < 32
-    let expr = col("a").lt(lit(5i32));
-
-    // Convert it to an opaque form
-    let bytes = expr.to_bytes()?;
-
-    // Decode bytes from somewhere (over network, etc.)
-    let decoded_expr = Expr::from_bytes(&bytes)?;
-    assert_eq!(expr, decoded_expr);
-    Ok(())
-}
diff --git a/datafusion/proto/examples/logical_plan_serde.rs 
b/datafusion/proto/examples/logical_plan_serde.rs
deleted file mode 100644
index 9f468638c1..0000000000
--- a/datafusion/proto/examples/logical_plan_serde.rs
+++ /dev/null
@@ -1,32 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-use datafusion::prelude::*;
-use datafusion_common::Result;
-use datafusion_proto::bytes::{logical_plan_from_bytes, logical_plan_to_bytes};
-
-#[tokio::main]
-async fn main() -> Result<()> {
-    let ctx = SessionContext::new();
-    ctx.register_csv("t1", "testdata/test.csv", CsvReadOptions::default())
-        .await?;
-    let plan = ctx.table("t1").await?.into_optimized_plan()?;
-    let bytes = logical_plan_to_bytes(&plan)?;
-    let logical_round_trip = logical_plan_from_bytes(&bytes, &ctx)?;
-    assert_eq!(format!("{plan:?}"), format!("{logical_round_trip:?}"));
-    Ok(())
-}
diff --git a/datafusion/proto/examples/physical_plan_serde.rs 
b/datafusion/proto/examples/physical_plan_serde.rs
deleted file mode 100644
index 72e216074a..0000000000
--- a/datafusion/proto/examples/physical_plan_serde.rs
+++ /dev/null
@@ -1,36 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-use datafusion::prelude::*;
-use datafusion_common::Result;
-use datafusion_proto::bytes::{physical_plan_from_bytes, 
physical_plan_to_bytes};
-
-#[tokio::main]
-async fn main() -> Result<()> {
-    let ctx = SessionContext::new();
-    ctx.register_csv("t1", "testdata/test.csv", CsvReadOptions::default())
-        .await?;
-    let dataframe = ctx.table("t1").await?;
-    let physical_plan = dataframe.create_physical_plan().await?;
-    let bytes = physical_plan_to_bytes(physical_plan.clone())?;
-    let physical_round_trip = physical_plan_from_bytes(&bytes, &ctx)?;
-    assert_eq!(
-        format!("{physical_plan:?}"),
-        format!("{physical_round_trip:?}")
-    );
-    Ok(())
-}
diff --git a/datafusion/proto/src/lib.rs b/datafusion/proto/src/lib.rs
index 872c26408e..5d60b9b574 100644
--- a/datafusion/proto/src/lib.rs
+++ b/datafusion/proto/src/lib.rs
@@ -15,8 +15,102 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! Serde code for logical plans and expressions.
-
+//! Serialize / Deserialize DataFusion Plans to bytes
+//!
+//! This crate provides support for serializing and deserializing the
+//! following structures to and from bytes:
+//!
+//! 1. [`LogicalPlan`]'s (including [`Expr`]),
+//! 2. [`ExecutionPlan`]s (including [`PhysicalExpr`])
+//!
+//! [`LogicalPlan`]: datafusion_expr::LogicalPlan
+//! [`Expr`]: datafusion_expr::Expr
+//! [`ExecutionPlan`]: datafusion::physical_plan::ExecutionPlan
+//! [`PhysicalExpr`]: datafusion::physical_expr::PhysicalExpr
+//!
+//! Internally, this crate is implemented by converting the plans to [protocol
+//! buffers] using [prost].
+//!
+//! [protocol buffers]: https://developers.google.com/protocol-buffers
+//! [prost]: https://docs.rs/prost/latest/prost/
+//!
+//! # Version Compatibility
+//!
+//! The serialized form are not guaranteed to be compatible across
+//! DataFusion versions. A plan serialized with one version of DataFusion
+//! may not be able to deserialized with a different version.
+//!
+//! # See Also
+//!
+//! The binary format created by this crate supports the full range of 
DataFusion
+//! plans, but is DataFusion specific. See [datafusion-substrait] for a crate
+//! which can encode many DataFusion plans using the [substrait.io] standard.
+//!
+//! [datafusion-substrait]: 
https://docs.rs/datafusion-substrait/latest/datafusion_substrait
+//! [substrait.io]: https://substrait.io
+//!
+//! # Example: Serializing [`Expr`]s
+//! ```
+//! # use datafusion_common::Result;
+//! # use datafusion_expr::{col, lit, Expr};
+//! # use datafusion_proto::bytes::Serializeable;
+//! # fn main() -> Result<()>{
+//!  // Create a new `Expr` a < 32
+//!  let expr = col("a").lt(lit(5i32));
+//!
+//!  // Convert it to bytes (for sending over the network, etc.)
+//!  let bytes = expr.to_bytes()?;
+//!
+//!  // Decode bytes from somewhere (over network, etc.) back to Expr
+//!  let decoded_expr = Expr::from_bytes(&bytes)?;
+//!  assert_eq!(expr, decoded_expr);
+//! # Ok(())
+//! # }
+//! ```
+//!
+//! # Example: Serializing [`LogicalPlan`]s
+//! ```
+//! # use datafusion::prelude::*;
+//! # use datafusion_common::Result;
+//! # use datafusion_proto::bytes::{logical_plan_from_bytes, 
logical_plan_to_bytes};
+//! # #[tokio::main]
+//! # async fn main() -> Result<()>{
+//!  // Create a plan that scans table 't'
+//!  let ctx = SessionContext::new();
+//!  ctx.register_csv("t1", "tests/testdata/test.csv", 
CsvReadOptions::default()).await?;
+//!  let plan = ctx.table("t1").await?.into_optimized_plan()?;
+//!
+//!  // Convert the plan into bytes (for sending over the network, etc.)
+//!  let bytes = logical_plan_to_bytes(&plan)?;
+//!
+//!  // Decode bytes from somewhere (over network, etc.) back to LogicalPlan
+//!  let logical_round_trip = logical_plan_from_bytes(&bytes, &ctx)?;
+//!  assert_eq!(format!("{:?}", plan), format!("{:?}", logical_round_trip));
+//! # Ok(())
+//! # }
+//! ```
+//! # Example: Serializing [`ExecutionPlan`]s
+//!
+//! ```
+//! # use datafusion::prelude::*;
+//! # use datafusion_common::Result;
+//! # use 
datafusion_proto::bytes::{physical_plan_from_bytes,physical_plan_to_bytes};
+//! # #[tokio::main]
+//! # async fn main() -> Result<()>{
+//!  // Create a plan that scans table 't'
+//!  let ctx = SessionContext::new();
+//!  ctx.register_csv("t1", "tests/testdata/test.csv", 
CsvReadOptions::default()).await?;
+//!  let physical_plan = ctx.table("t1").await?.create_physical_plan().await?;
+//!
+//!  // Convert the plan into bytes (for sending over the network, etc.)
+//!  let bytes = physical_plan_to_bytes(physical_plan.clone())?;
+//!
+//!  // Decode bytes from somewhere (over network, etc.) back to ExecutionPlan
+//!  let physical_round_trip = physical_plan_from_bytes(&bytes, &ctx)?;
+//!  assert_eq!(format!("{:?}", physical_plan), format!("{:?}", 
physical_round_trip));
+//! # Ok(())
+//! # }
+//! ```
 pub mod bytes;
 pub mod common;
 pub mod generated;
diff --git a/datafusion/substrait/README.md b/datafusion/substrait/README.md
index 17591cdd62..a9f2ba4c3c 100644
--- a/datafusion/substrait/README.md
+++ b/datafusion/substrait/README.md
@@ -17,18 +17,10 @@
   under the License.
 -->
 
-# DataFusion + Substrait
+# Apache Arrow DataFusion Substrait
 
-[Substrait](https://substrait.io/) provides a cross-language serialization 
format for relational algebra, based on
-protocol buffers.
+This crate contains a [Substrait] producer and consumer for Apache Arrow
+[DataFusion] plans. See [API Docs] for details and examples.
 
-This repository provides a Substrait producer and consumer for DataFusion:
-
-- The producer converts a DataFusion logical plan into a Substrait protobuf.
-- The consumer converts a Substrait protobuf into a DataFusion logical plan.
-
-Potential uses of this crate:
-
-- Replace the current [DataFusion protobuf 
definition](https://github.com/apache/arrow-datafusion/blob/main/datafusion/proto/proto/datafusion.proto)
 used in Ballista for passing query plan fragments to executors
-- Make it easier to pass query plans over FFI boundaries, such as from Python 
to Rust
-- Allow Apache Calcite query plans to be executed in DataFusion
+[substrait]: https://substrait.io
+[api docs]: https://docs.rs/datafusion-substrait/latest
diff --git a/datafusion/substrait/src/lib.rs b/datafusion/substrait/src/lib.rs
index 432553ec79..454f0e7b7c 100644
--- a/datafusion/substrait/src/lib.rs
+++ b/datafusion/substrait/src/lib.rs
@@ -15,6 +15,63 @@
 // specific language governing permissions and limitations
 // under the License.
 
+//! Serialize / Deserialize DataFusion Plans to [Substrait.io]
+//!
+//! This crate provides support for serializing and deserializing both 
DataFusion
+//! [`LogicalPlan`] and [`ExecutionPlan`] to and from the generated types in
+//! [substrait::proto] from the [substrait] crate.
+//!
+//! [Substrait.io] provides a cross-language serialization format for 
relational
+//! algebra (e.g. query plans and expressions), based on protocol buffers.
+//!
+//! [Substrait.io]: https://substrait.io/
+//!
+//! [`LogicalPlan`]: datafusion::logical_expr::LogicalPlan
+//! [`ExecutionPlan`]: datafusion::physical_plan::ExecutionPlan
+//!
+//! Potential uses of this crate:
+//! * Use DataFusion to run Substrait plans created by other systems (e.g. 
Apache Calcite)
+//! * Use DataFusion to create plans to run on other systems
+//! * Pass query plans over FFI boundaries, such as from Python to Rust
+//! * Pass query plans across node boundaries
+//!
+//! # See Also
+//!
+//! Substrait does not (yet) support the full range of plans and expressions
+//! that DataFusion offers. See the [datafusion-proto]  crate for a DataFusion
+//! specific format that does support of the full range.
+//!
+//! [datafusion-proto]: 
https://docs.rs/datafusion-proto/latest/datafusion_proto
+//!
+//! Note that generated types  such as [`substrait::proto::Plan`] and
+//! [`substrait::proto::Rel`] can be serialized / deserialized to bytes, JSON 
and
+//! other formats using [prost] and the rest of the Rust protobuf ecosystem.
+//!
+//! # Example: Serializing [`LogicalPlan`]s
+//! ```
+//! # use datafusion::prelude::*;
+//! # use datafusion::error::Result;
+//! # #[tokio::main(flavor = "current_thread")]
+//! # async fn main() -> Result<()>{
+//! # use std::sync::Arc;
+//! # use datafusion::arrow::array::{Int32Array, RecordBatch};
+//! # use datafusion_substrait::logical_plan;
+//! // Create a plan that scans table 't'
+//!  let ctx = SessionContext::new();
+//!  let batch = RecordBatch::try_from_iter(vec![("x", 
Arc::new(Int32Array::from(vec![42])) as _)])?;
+//!  ctx.register_batch("t", batch)?;
+//!  let df = ctx.sql("SELECT x from t").await?;
+//!  let plan = df.into_optimized_plan()?;
+//!
+//!  // Convert the plan into a substrait (protobuf) Plan
+//!  let substrait_plan = logical_plan::producer::to_substrait_plan(&plan, 
&ctx)?;
+//!
+//!  // Receive a substrait protobuf from somewhere, and turn it into a 
LogicalPlan
+//!  let logical_round_trip = 
logical_plan::consumer::from_substrait_plan(&ctx, &substrait_plan).await?;
+//!  assert_eq!(format!("{:?}", plan), format!("{:?}", logical_round_trip));
+//! # Ok(())
+//! # }
+//! ```
 pub mod logical_plan;
 pub mod physical_plan;
 pub mod serializer;

Reply via email to