alamb commented on code in PR #5962:
URL: https://github.com/apache/arrow-datafusion/pull/5962#discussion_r1164586488
##########
docs/source/user-guide/example-usage.md:
##########
@@ -141,3 +141,112 @@ async fn main() -> datafusion::error::Result<()> {
| 1 | 2 |
+---+--------+
```
+
+# Using DataFusion as a library
+
+## Create a new project
+
+```shell
+cargo new hello_datafusion
+```
+
+```shell
+$ cd hello_datafusion
+$ tree .
+.
+├── Cargo.toml
+└── src
+ └── main.rs
+
+1 directory, 2 files
+```
+
+## Default Configuration
+
+DataFusion is [published on crates.io](https://crates.io/crates/datafusion),
and is [well documented on docs.rs](https://docs.rs/datafusion/).
+
+To get started, add the following to your `Cargo.toml` file:
+
+```toml
+[dependencies]
+datafusion = "11.0"
+```
+
+## Create a main function
+
+Update the main.rs file with your first datafusion application based on
[Example
usage](https://arrow.apache.org/datafusion/user-guide/example-usage.html)
Review Comment:
Yes, that is a good a catch. This whole page has some non trivial
redundancy. I will try and fix it up
##########
docs/source/user-guide/faq.md:
##########
@@ -29,3 +29,37 @@ model and computational kernels. It is designed to run
within a single process,
for parallel query execution.
[Ballista](https://github.com/apache/arrow-ballista) is a distributed compute
platform built on DataFusion.
+
+# How does DataFusion Compare with `XYZ`?
+
+When compared to similar systems, DataFusion typically is:
+
+1. Targeted at developers, rather than end users / data scientists.
+2. Designed to be embedded, rather than a complete file based SQL system.
+3. Governed by the [Apache Software Foundation](https://www.apache.org/)
process, rather than a single company or individual.
+4. Implemented in `Rust`, rather than `C/C++`
+
+Here is a comparison with similar projects that may help understand
+when DataFusion might be be suitable and unsuitable for your needs:
+
+- [DuckDB](http://www.duckdb.org) is an open source, in process analytic
database.
Review Comment:
in c18786332
##########
docs/source/user-guide/example-usage.md:
##########
@@ -141,3 +141,112 @@ async fn main() -> datafusion::error::Result<()> {
| 1 | 2 |
+---+--------+
```
+
+# Using DataFusion as a library
+
+## Create a new project
+
+```shell
+cargo new hello_datafusion
+```
+
+```shell
+$ cd hello_datafusion
+$ tree .
+.
+├── Cargo.toml
+└── src
+ └── main.rs
+
+1 directory, 2 files
+```
+
+## Default Configuration
+
+DataFusion is [published on crates.io](https://crates.io/crates/datafusion),
and is [well documented on docs.rs](https://docs.rs/datafusion/).
+
+To get started, add the following to your `Cargo.toml` file:
+
+```toml
+[dependencies]
+datafusion = "11.0"
+```
+
+## Create a main function
+
+Update the main.rs file with your first datafusion application based on
[Example
usage](https://arrow.apache.org/datafusion/user-guide/example-usage.html)
+
+```rust
+use datafusion::prelude::*;
+
+#[tokio::main]
+async fn main() -> datafusion::error::Result<()> {
+ // register the table
+ let ctx = SessionContext::new();
+ ctx.register_csv("test", "<PATH_TO_YOUR_CSV_FILE>",
CsvReadOptions::new()).await?;
+
+ // create a plan to run a SQL query
+ let df = ctx.sql("SELECT * FROM test").await?;
+
+ // execute and print results
+ df.show().await?;
+ Ok(())
+}
+```
+
+## Extensibility
+
+DataFusion is designed to be extensible at all points. To that end, you can
provide your own custom:
+
+- [x] User Defined Functions (UDFs)
+- [x] User Defined Aggregate Functions (UDAFs)
+- [x] User Defined Table Source (`TableProvider`) for tables
+- [x] User Defined `Optimizer` passes (plan rewrites)
+- [x] User Defined `LogicalPlan` nodes
+- [x] User Defined `ExecutionPlan` nodes
+
+## Rust Version Compatibility
+
+This crate is tested with the latest stable version of Rust. We do not
currently test against other, older versions of the Rust compiler.
+
+## Optimized Configuration
+
+For an optimized build several steps are required. First, use the below in
your `Cargo.toml`. It is
+worth noting that using the settings in the `[profile.release]` section will
significantly increase the build time.
+
+```toml
+[dependencies]
+datafusion = { version = "11.0" , features = ["simd"]}
Review Comment:
Maybe we could update the script here to automatically clean it up:
https://github.com/apache/arrow-datafusion/blob/main/dev/update_datafusion_versions.py
##########
docs/source/user-guide/example-usage.md:
##########
@@ -141,3 +141,112 @@ async fn main() -> datafusion::error::Result<()> {
| 1 | 2 |
+---+--------+
```
+
+# Using DataFusion as a library
+
+## Create a new project
+
+```shell
+cargo new hello_datafusion
+```
+
+```shell
+$ cd hello_datafusion
+$ tree .
+.
+├── Cargo.toml
+└── src
+ └── main.rs
+
+1 directory, 2 files
+```
+
+## Default Configuration
+
+DataFusion is [published on crates.io](https://crates.io/crates/datafusion),
and is [well documented on docs.rs](https://docs.rs/datafusion/).
+
+To get started, add the following to your `Cargo.toml` file:
+
+```toml
+[dependencies]
+datafusion = "11.0"
+```
+
+## Create a main function
+
+Update the main.rs file with your first datafusion application based on
[Example
usage](https://arrow.apache.org/datafusion/user-guide/example-usage.html)
+
+```rust
+use datafusion::prelude::*;
+
+#[tokio::main]
+async fn main() -> datafusion::error::Result<()> {
+ // register the table
+ let ctx = SessionContext::new();
+ ctx.register_csv("test", "<PATH_TO_YOUR_CSV_FILE>",
CsvReadOptions::new()).await?;
+
+ // create a plan to run a SQL query
+ let df = ctx.sql("SELECT * FROM test").await?;
+
+ // execute and print results
+ df.show().await?;
+ Ok(())
+}
+```
Review Comment:
I agree -- removed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]