alamb commented on code in PR #8319:
URL: https://github.com/apache/arrow-datafusion/pull/8319#discussion_r1407614920


##########
docs/source/library-user-guide/using-the-dataframe-api.md:
##########
@@ -19,4 +19,123 @@
 
 # Using the DataFrame API
 
-Coming Soon
+## What is a DataFrame
+
+`DataFrame` is a basic concept in `datafusion` and is only a thin wrapper over 
LogicalPlan.

Review Comment:
   ```suggestion
   `DataFrame` in `DataFrame` is modeled after the Pandas DataFrame interface, 
and is a thin wrapper over LogicalPlan that adds functionality for building and 
executing those plans.
   ```



##########
docs/source/library-user-guide/using-the-dataframe-api.md:
##########
@@ -19,4 +19,123 @@
 
 # Using the DataFrame API
 
-Coming Soon
+## What is a DataFrame
+
+`DataFrame` is a basic concept in `datafusion` and is only a thin wrapper over 
LogicalPlan.
+
+```rust
+pub struct DataFrame {
+    session_state: SessionState,
+    plan: LogicalPlan,
+}
+```
+
+For both `DataFrame` and `LogicalPlan`, you can build the query manually, such 
as:
+
+```rust
+let df = ctx.table("users").await?;
+
+let new_df = df.select(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?;
+
+let plan = LogicalPlanBuilder::from(&df.to_logical_plan())
+    .project(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?
+    .build()?;
+```
+
+But The main difference between a DataFrame and a LogicalPlan is that the 
DataFrame contains functionality for executing queries rather than just 
building plans.
+
+You can use `collect` or `execute_stream` to execute the query.
+
+## How to generate a DataFrame
+
+You can manually call the `DataFrame` API or automatically generate a 
`DataFrame` through the SQL query planner just like:
+
+use `sql` to construct `DataFrame`:
+
+```rust
+let ctx = SessionContext::new();
+// Register the in-memory table containing the data
+ctx.register_table("users", Arc::new(create_memtable()?))?;
+let dataframe = ctx.sql("SELECT * FROM users;").await?;
+```
+
+construct `DataFrame` manually
+
+```rust
+let ctx = SessionContext::new();
+// Register the in-memory table containing the data
+ctx.register_table("users", Arc::new(create_memtable()?))?;
+let dataframe = ctx
+  .table("users")
+  .filter(col("a").lt_eq(col("b")))?
+  .sort(vec![col("a").sort(true, true), col("b").sort(false, false)])?;
+```
+
+## Collect / Streaming Exec
+
+When you have a `DataFrame`, you may want to access the results of the 
internal `LogicalPlan`. You can do this by using `collect` to retrieve all 
outputs at once, or `streaming_exec` to obtain a `SendableRecordBatchStream`.
+
+You can just collect all outputs once like:
+
+```rust
+let ctx = SessionContext::new();
+let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
+let batches = df.collect().await?;
+```
+
+You can also use stream output to iterate the `RecordBatch`
+
+```rust
+let ctx = SessionContext::new();
+let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
+let mut stream = df.execute_stream().await?;
+while let Some(rb) = stream.next().await {
+    println!("{rb:?}");
+}
+```
+
+# Write DataFrame to Files
+
+You can also serialize `DataFrame` to a file. For now, `Datafusion` supports 
write `DataFrame` to `csv`, `json` and `parquet`.
+
+Before writing to a file, it will call collect to calculate all the results of 
the DataFrame and then write to file.

Review Comment:
   I don't think the DataFrame API calls collect -- instead I think it uses the 
streaming APIs
   
   ```suggestion
   When writing a file, DataFusion will execute the DataFrame and stream the 
results to a file.
   ```



##########
docs/source/library-user-guide/using-the-dataframe-api.md:
##########
@@ -19,4 +19,123 @@
 
 # Using the DataFrame API
 
-Coming Soon
+## What is a DataFrame
+
+`DataFrame` is a basic concept in `datafusion` and is only a thin wrapper over 
LogicalPlan.
+
+```rust
+pub struct DataFrame {
+    session_state: SessionState,
+    plan: LogicalPlan,
+}
+```
+
+For both `DataFrame` and `LogicalPlan`, you can build the query manually, such 
as:
+
+```rust
+let df = ctx.table("users").await?;
+
+let new_df = df.select(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?;
+
+let plan = LogicalPlanBuilder::from(&df.to_logical_plan())
+    .project(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?
+    .build()?;
+```
+
+But The main difference between a DataFrame and a LogicalPlan is that the 
DataFrame contains functionality for executing queries rather than just 
building plans.
+
+You can use `collect` or `execute_stream` to execute the query.

Review Comment:
   ```suggestion
   You can use `collect` or `execute_stream` to execute the query.
   ```



##########
docs/source/library-user-guide/using-the-dataframe-api.md:
##########
@@ -19,4 +19,123 @@
 
 # Using the DataFrame API
 
-Coming Soon
+## What is a DataFrame
+
+`DataFrame` is a basic concept in `datafusion` and is only a thin wrapper over 
LogicalPlan.
+
+```rust
+pub struct DataFrame {
+    session_state: SessionState,
+    plan: LogicalPlan,
+}
+```
+
+For both `DataFrame` and `LogicalPlan`, you can build the query manually, such 
as:
+
+```rust
+let df = ctx.table("users").await?;
+
+let new_df = df.select(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?;
+
+let plan = LogicalPlanBuilder::from(&df.to_logical_plan())

Review Comment:
   ```suggestion
   // Build the same plan using the LogicalPlanBuilder
   let plan = LogicalPlanBuilder::from(&df.to_logical_plan())
   ```



##########
docs/source/library-user-guide/using-the-dataframe-api.md:
##########
@@ -19,4 +19,123 @@
 
 # Using the DataFrame API
 
-Coming Soon
+## What is a DataFrame
+
+`DataFrame` is a basic concept in `datafusion` and is only a thin wrapper over 
LogicalPlan.
+
+```rust
+pub struct DataFrame {
+    session_state: SessionState,
+    plan: LogicalPlan,
+}
+```
+
+For both `DataFrame` and `LogicalPlan`, you can build the query manually, such 
as:
+
+```rust
+let df = ctx.table("users").await?;
+
+let new_df = df.select(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?;
+
+let plan = LogicalPlanBuilder::from(&df.to_logical_plan())
+    .project(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?
+    .build()?;
+```
+
+But The main difference between a DataFrame and a LogicalPlan is that the 
DataFrame contains functionality for executing queries rather than just 
building plans.
+
+You can use `collect` or `execute_stream` to execute the query.
+
+## How to generate a DataFrame
+
+You can manually call the `DataFrame` API or automatically generate a 
`DataFrame` through the SQL query planner just like:
+
+use `sql` to construct `DataFrame`:

Review Comment:
   ```suggestion
   For example, to use `sql` to construct `DataFrame`:
   ```



##########
docs/source/library-user-guide/using-the-dataframe-api.md:
##########
@@ -19,4 +19,123 @@
 
 # Using the DataFrame API
 
-Coming Soon
+## What is a DataFrame
+
+`DataFrame` is a basic concept in `datafusion` and is only a thin wrapper over 
LogicalPlan.
+
+```rust
+pub struct DataFrame {
+    session_state: SessionState,
+    plan: LogicalPlan,
+}
+```
+
+For both `DataFrame` and `LogicalPlan`, you can build the query manually, such 
as:
+
+```rust
+let df = ctx.table("users").await?;
+
+let new_df = df.select(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?;
+
+let plan = LogicalPlanBuilder::from(&df.to_logical_plan())
+    .project(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?
+    .build()?;
+```
+
+But The main difference between a DataFrame and a LogicalPlan is that the 
DataFrame contains functionality for executing queries rather than just 
building plans.
+
+You can use `collect` or `execute_stream` to execute the query.
+
+## How to generate a DataFrame
+
+You can manually call the `DataFrame` API or automatically generate a 
`DataFrame` through the SQL query planner just like:
+
+use `sql` to construct `DataFrame`:
+
+```rust
+let ctx = SessionContext::new();
+// Register the in-memory table containing the data
+ctx.register_table("users", Arc::new(create_memtable()?))?;
+let dataframe = ctx.sql("SELECT * FROM users;").await?;
+```
+
+construct `DataFrame` manually
+
+```rust
+let ctx = SessionContext::new();
+// Register the in-memory table containing the data
+ctx.register_table("users", Arc::new(create_memtable()?))?;
+let dataframe = ctx
+  .table("users")
+  .filter(col("a").lt_eq(col("b")))?
+  .sort(vec![col("a").sort(true, true), col("b").sort(false, false)])?;
+```
+
+## Collect / Streaming Exec
+
+When you have a `DataFrame`, you may want to access the results of the 
internal `LogicalPlan`. You can do this by using `collect` to retrieve all 
outputs at once, or `streaming_exec` to obtain a `SendableRecordBatchStream`.
+
+You can just collect all outputs once like:
+
+```rust
+let ctx = SessionContext::new();
+let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
+let batches = df.collect().await?;
+```
+
+You can also use stream output to iterate the `RecordBatch`

Review Comment:
   ```suggestion
   You can also use stream output to incrementally generate output one 
`RecordBatch` at a time
   ```



##########
docs/source/library-user-guide/using-the-dataframe-api.md:
##########
@@ -19,4 +19,123 @@
 
 # Using the DataFrame API
 
-Coming Soon
+## What is a DataFrame
+
+`DataFrame` is a basic concept in `datafusion` and is only a thin wrapper over 
LogicalPlan.
+
+```rust
+pub struct DataFrame {
+    session_state: SessionState,
+    plan: LogicalPlan,
+}
+```
+
+For both `DataFrame` and `LogicalPlan`, you can build the query manually, such 
as:
+
+```rust
+let df = ctx.table("users").await?;
+
+let new_df = df.select(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?;
+
+let plan = LogicalPlanBuilder::from(&df.to_logical_plan())
+    .project(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?
+    .build()?;
+```
+
+But The main difference between a DataFrame and a LogicalPlan is that the 
DataFrame contains functionality for executing queries rather than just 
building plans.
+
+You can use `collect` or `execute_stream` to execute the query.
+
+## How to generate a DataFrame
+
+You can manually call the `DataFrame` API or automatically generate a 
`DataFrame` through the SQL query planner just like:
+
+use `sql` to construct `DataFrame`:
+
+```rust
+let ctx = SessionContext::new();
+// Register the in-memory table containing the data
+ctx.register_table("users", Arc::new(create_memtable()?))?;
+let dataframe = ctx.sql("SELECT * FROM users;").await?;
+```
+
+construct `DataFrame` manually
+
+```rust
+let ctx = SessionContext::new();
+// Register the in-memory table containing the data
+ctx.register_table("users", Arc::new(create_memtable()?))?;
+let dataframe = ctx
+  .table("users")
+  .filter(col("a").lt_eq(col("b")))?
+  .sort(vec![col("a").sort(true, true), col("b").sort(false, false)])?;
+```
+
+## Collect / Streaming Exec
+
+When you have a `DataFrame`, you may want to access the results of the 
internal `LogicalPlan`. You can do this by using `collect` to retrieve all 
outputs at once, or `streaming_exec` to obtain a `SendableRecordBatchStream`.
+
+You can just collect all outputs once like:
+
+```rust
+let ctx = SessionContext::new();
+let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
+let batches = df.collect().await?;
+```
+
+You can also use stream output to iterate the `RecordBatch`
+
+```rust
+let ctx = SessionContext::new();
+let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
+let mut stream = df.execute_stream().await?;
+while let Some(rb) = stream.next().await {
+    println!("{rb:?}");
+}
+```
+
+# Write DataFrame to Files
+
+You can also serialize `DataFrame` to a file. For now, `Datafusion` supports 
write `DataFrame` to `csv`, `json` and `parquet`.
+
+Before writing to a file, it will call collect to calculate all the results of 
the DataFrame and then write to file.
+
+For example, if you write it to a csv_file
+
+```rust
+let ctx = SessionContext::new();
+// Register the in-memory table containing the data
+ctx.register_table("users", Arc::new(mem_table))?;
+let dataframe = ctx.sql("SELECT * FROM users;").await?;
+
+dataframe
+    .write_csv("user_dataframe.csv", DataFrameWriteOptions::default(), None)
+    .await;
+```
+
+and the file will look like (Example Output):
+
+```
+id,bank_account
+1,9000
+```
+
+## Transform between LogicalPlan and DataFrame
+
+As it is showed above, `DataFrame` is just a very thin wrapper of 
`LogicalPlan`, so you can easily go back and forth between them.

Review Comment:
   ```suggestion
   As shown above, `DataFrame` is just a very thin wrapper of `LogicalPlan`, so 
you can easily go back and forth between them.
   ```



##########
docs/source/library-user-guide/using-the-dataframe-api.md:
##########
@@ -19,4 +19,123 @@
 
 # Using the DataFrame API
 
-Coming Soon
+## What is a DataFrame
+
+`DataFrame` is a basic concept in `datafusion` and is only a thin wrapper over 
LogicalPlan.
+
+```rust
+pub struct DataFrame {
+    session_state: SessionState,
+    plan: LogicalPlan,
+}
+```
+
+For both `DataFrame` and `LogicalPlan`, you can build the query manually, such 
as:

Review Comment:
   ```suggestion
   You can build up `DataFrame`s using its methods, similarly to building 
`LogicalPlan`s using `LogicalPlanBuilder`:
   ```



##########
docs/source/library-user-guide/using-the-dataframe-api.md:
##########
@@ -19,4 +19,123 @@
 
 # Using the DataFrame API
 
-Coming Soon
+## What is a DataFrame
+
+`DataFrame` is a basic concept in `datafusion` and is only a thin wrapper over 
LogicalPlan.
+
+```rust
+pub struct DataFrame {
+    session_state: SessionState,
+    plan: LogicalPlan,
+}
+```
+
+For both `DataFrame` and `LogicalPlan`, you can build the query manually, such 
as:
+
+```rust
+let df = ctx.table("users").await?;
+
+let new_df = df.select(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?;
+
+let plan = LogicalPlanBuilder::from(&df.to_logical_plan())
+    .project(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?
+    .build()?;
+```
+
+But The main difference between a DataFrame and a LogicalPlan is that the 
DataFrame contains functionality for executing queries rather than just 
building plans.
+
+You can use `collect` or `execute_stream` to execute the query.
+
+## How to generate a DataFrame
+
+You can manually call the `DataFrame` API or automatically generate a 
`DataFrame` through the SQL query planner just like:

Review Comment:
   ```suggestion
   You can directly use the `DataFrame` API or generate a `DataFrame` from a 
SQL query.
   ```



##########
docs/source/library-user-guide/using-the-dataframe-api.md:
##########
@@ -19,4 +19,123 @@
 
 # Using the DataFrame API
 
-Coming Soon
+## What is a DataFrame
+
+`DataFrame` is a basic concept in `datafusion` and is only a thin wrapper over 
LogicalPlan.
+
+```rust
+pub struct DataFrame {
+    session_state: SessionState,
+    plan: LogicalPlan,
+}
+```
+
+For both `DataFrame` and `LogicalPlan`, you can build the query manually, such 
as:
+
+```rust
+let df = ctx.table("users").await?;
+
+let new_df = df.select(vec![col("id"), col("bank_account")])?

Review Comment:
   ```suggestion
   // Create a new DataFrame sorted by  `id`, `bank_account`
   let new_df = df.select(vec![col("id"), col("bank_account")])?
   ```



##########
docs/source/library-user-guide/using-the-dataframe-api.md:
##########
@@ -19,4 +19,123 @@
 
 # Using the DataFrame API
 
-Coming Soon
+## What is a DataFrame
+
+`DataFrame` is a basic concept in `datafusion` and is only a thin wrapper over 
LogicalPlan.
+
+```rust
+pub struct DataFrame {
+    session_state: SessionState,
+    plan: LogicalPlan,
+}
+```
+
+For both `DataFrame` and `LogicalPlan`, you can build the query manually, such 
as:
+
+```rust
+let df = ctx.table("users").await?;
+
+let new_df = df.select(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?;
+
+let plan = LogicalPlanBuilder::from(&df.to_logical_plan())
+    .project(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?
+    .build()?;
+```
+
+But The main difference between a DataFrame and a LogicalPlan is that the 
DataFrame contains functionality for executing queries rather than just 
building plans.
+
+You can use `collect` or `execute_stream` to execute the query.
+
+## How to generate a DataFrame
+
+You can manually call the `DataFrame` API or automatically generate a 
`DataFrame` through the SQL query planner just like:
+
+use `sql` to construct `DataFrame`:
+
+```rust
+let ctx = SessionContext::new();
+// Register the in-memory table containing the data
+ctx.register_table("users", Arc::new(create_memtable()?))?;
+let dataframe = ctx.sql("SELECT * FROM users;").await?;
+```
+
+construct `DataFrame` manually
+
+```rust
+let ctx = SessionContext::new();
+// Register the in-memory table containing the data
+ctx.register_table("users", Arc::new(create_memtable()?))?;
+let dataframe = ctx
+  .table("users")
+  .filter(col("a").lt_eq(col("b")))?
+  .sort(vec![col("a").sort(true, true), col("b").sort(false, false)])?;
+```
+
+## Collect / Streaming Exec
+
+When you have a `DataFrame`, you may want to access the results of the 
internal `LogicalPlan`. You can do this by using `collect` to retrieve all 
outputs at once, or `streaming_exec` to obtain a `SendableRecordBatchStream`.

Review Comment:
   ```suggestion
   DataFusion `DataFrame`s are "lazy", meaning they do not do any processing 
until they are executed, which allows for additional optimizations.
   
   When you have a `DataFrame`, you can run it in one of three ways:
   1.  `collect` which executes the query and buffers all the output into a 
`Vec<RecordBatch>`
   2. `streaming_exec`, which begins executions and returns a 
`SendableRecordBatchStream` which incrementally computes output on each call to 
`next()`
   3. `cache` which executes the query and buffers the output into a new in 
memory DataFrame. 
   ```



##########
docs/source/library-user-guide/using-the-dataframe-api.md:
##########
@@ -19,4 +19,123 @@
 
 # Using the DataFrame API
 
-Coming Soon
+## What is a DataFrame
+
+`DataFrame` is a basic concept in `datafusion` and is only a thin wrapper over 
LogicalPlan.
+
+```rust
+pub struct DataFrame {
+    session_state: SessionState,
+    plan: LogicalPlan,
+}
+```
+
+For both `DataFrame` and `LogicalPlan`, you can build the query manually, such 
as:
+
+```rust
+let df = ctx.table("users").await?;
+
+let new_df = df.select(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?;
+
+let plan = LogicalPlanBuilder::from(&df.to_logical_plan())
+    .project(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?
+    .build()?;
+```
+
+But The main difference between a DataFrame and a LogicalPlan is that the 
DataFrame contains functionality for executing queries rather than just 
building plans.
+
+You can use `collect` or `execute_stream` to execute the query.
+
+## How to generate a DataFrame
+
+You can manually call the `DataFrame` API or automatically generate a 
`DataFrame` through the SQL query planner just like:
+
+use `sql` to construct `DataFrame`:
+
+```rust
+let ctx = SessionContext::new();
+// Register the in-memory table containing the data
+ctx.register_table("users", Arc::new(create_memtable()?))?;
+let dataframe = ctx.sql("SELECT * FROM users;").await?;
+```
+
+construct `DataFrame` manually

Review Comment:
   ```suggestion
   To construct `DataFrame` using the API:
   ```



##########
docs/source/library-user-guide/using-the-dataframe-api.md:
##########
@@ -19,4 +19,123 @@
 
 # Using the DataFrame API
 
-Coming Soon
+## What is a DataFrame
+
+`DataFrame` is a basic concept in `datafusion` and is only a thin wrapper over 
LogicalPlan.
+
+```rust
+pub struct DataFrame {
+    session_state: SessionState,
+    plan: LogicalPlan,
+}
+```
+
+For both `DataFrame` and `LogicalPlan`, you can build the query manually, such 
as:
+
+```rust
+let df = ctx.table("users").await?;
+
+let new_df = df.select(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?;
+
+let plan = LogicalPlanBuilder::from(&df.to_logical_plan())
+    .project(vec![col("id"), col("bank_account")])?
+    .sort(vec![col("id")])?
+    .build()?;
+```
+
+But The main difference between a DataFrame and a LogicalPlan is that the 
DataFrame contains functionality for executing queries rather than just 
building plans.
+
+You can use `collect` or `execute_stream` to execute the query.
+
+## How to generate a DataFrame
+
+You can manually call the `DataFrame` API or automatically generate a 
`DataFrame` through the SQL query planner just like:
+
+use `sql` to construct `DataFrame`:
+
+```rust
+let ctx = SessionContext::new();
+// Register the in-memory table containing the data
+ctx.register_table("users", Arc::new(create_memtable()?))?;
+let dataframe = ctx.sql("SELECT * FROM users;").await?;
+```
+
+construct `DataFrame` manually
+
+```rust
+let ctx = SessionContext::new();
+// Register the in-memory table containing the data
+ctx.register_table("users", Arc::new(create_memtable()?))?;
+let dataframe = ctx
+  .table("users")
+  .filter(col("a").lt_eq(col("b")))?
+  .sort(vec![col("a").sort(true, true), col("b").sort(false, false)])?;
+```
+
+## Collect / Streaming Exec
+
+When you have a `DataFrame`, you may want to access the results of the 
internal `LogicalPlan`. You can do this by using `collect` to retrieve all 
outputs at once, or `streaming_exec` to obtain a `SendableRecordBatchStream`.
+
+You can just collect all outputs once like:
+
+```rust
+let ctx = SessionContext::new();
+let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
+let batches = df.collect().await?;
+```
+
+You can also use stream output to iterate the `RecordBatch`
+
+```rust
+let ctx = SessionContext::new();
+let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
+let mut stream = df.execute_stream().await?;
+while let Some(rb) = stream.next().await {
+    println!("{rb:?}");
+}
+```
+
+# Write DataFrame to Files
+
+You can also serialize `DataFrame` to a file. For now, `Datafusion` supports 
write `DataFrame` to `csv`, `json` and `parquet`.
+
+Before writing to a file, it will call collect to calculate all the results of 
the DataFrame and then write to file.
+
+For example, if you write it to a csv_file

Review Comment:
   ```suggestion
   For example, to write a csv_file
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to