Weijun-H commented on code in PR #8319:
URL: https://github.com/apache/arrow-datafusion/pull/8319#discussion_r1405474559
##########
docs/source/library-user-guide/using-the-dataframe-api.md:
##########
@@ -19,4 +19,105 @@
# Using the DataFrame API
-Coming Soon
+## What is a DataFrame
+
+`DataFrame` is a basic concept in `datafusion` and is only a thin wrapper over
LogicalPlan.
+
+```rust
+pub struct DataFrame {
+ session_state: SessionState,
+ plan: LogicalPlan,
+}
+```
+
+## How to generate a DataFrame
+
+You can manually call the `DataFrame` API or automatically generate a
`DataFrame` through the SQL query planner just like:
+
+use `sql` to construct `DataFrame`:
+
+```rust
+let ctx = SessionContext::new();
+// Register the in-memory table containing the data
+ctx.register_table("users", Arc::new(create_memtable()?))?;
+let dataframe = ctx.sql("SELECT * FROM users;").await?;
+```
+
+construct `DataFrame` manually
+
+```rust
+let ctx = SessionContext::new();
+// Register the in-memory table containing the data
+ctx.register_table("users", Arc::new(create_memtable()?))?;
+let dataframe = ctx
+ .table("users")
+ .filter(col("a").lt_eq(col("b")))?
+ .sort(vec![col("a").sort(true, true), col("b").sort(false, false)])?;
+```
+
+## Collect / Streaming Exec
+
+When you have a `DataFrame`, you may want to access the results of the
internal `LogicalPlan`. You can do this by using `collect` to retrieve all
outputs at once, or `streaming_exec` to obtain a `SendableRecordBatchStream`.
+
+You can just collect all outputs once like:
+
+```rust
+let ctx = SessionContext::new();
+let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
+let batches = df.collect().await?;
+```
+
+You can also use stream output to iterate the `RecordBatch`
+
+```rust
+let ctx = SessionContext::new();
+let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
+let mut stream = df.execute_stream().await?;
+while let Some(rb) = stream.next().await {
+ println!("{rb:?}");
+}
+```
+
+# Write DataFrame to Files
+
+You can also serializate `DataFrame` to a file. For now, `Datafusion` supports
write `DataFrame` to `csv`, `json` and `parquet`.
Review Comment:
```suggestion
You can also serialize `DataFrame` to a file. For now, `Datafusion` supports
write `DataFrame` to `csv`, `json` and `parquet`.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]