This is an automated email from the ASF dual-hosted git repository.
agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git
The following commit(s) were added to refs/heads/main by this push:
new 75dea3d Minor docs updates (#210)
75dea3d is described below
commit 75dea3dbd530421821becc3642e43036d1a3c121
Author: Andy Grove <[email protected]>
AuthorDate: Wed Feb 22 07:05:17 2023 -0700
Minor docs updates (#210)
* Add cuDF to examples
* lint
---
README.md | 29 +++++++++++++++--------------
examples/README.md | 19 ++++++++++---------
examples/sql-on-cudf.py | 4 +---
3 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/README.md b/README.md
index e78f613..d83b78c 100644
--- a/README.md
+++ b/README.md
@@ -28,26 +28,26 @@ DataFusion's Python bindings can be used as an end-user
tool as well as providin
## Features
-- Execute queries using SQL or DataFrames against CSV, Parquet, and JSON data
sources
-- Queries are optimized using DataFusion's query optimizer
-- Execute user-defined Python code from SQL
-- Exchange data with Pandas and other DataFrame libraries that support PyArrow
-- Serialize and deserialize query plans in Substrait format
-- Experimental support for executing SQL queries against Polars, Pandas and
cuDF
+- Execute queries using SQL or DataFrames against CSV, Parquet, and JSON data
sources.
+- Queries are optimized using DataFusion's query optimizer.
+- Execute user-defined Python code from SQL.
+- Exchange data with Pandas and other DataFrame libraries that support PyArrow.
+- Serialize and deserialize query plans in Substrait format.
+- Experimental support for transpiling SQL queries to DataFrame calls with
Polars, Pandas, and cuDF.
## Comparison with other projects
-Here is a comparison with similar projects that may help understand when
DataFusion might be suitable and unsuitable
+Here is a comparison with similar projects that may help understand when
DataFusion might be suitable and unsuitable
for your needs:
-- [DuckDB](http://www.duckdb.org/) is an open source, in-process analytic
database. Like DataFusion, it supports
- very fast execution, both from its custom file format and directly from
Parquet files. Unlike DataFusion, it is
- written in C/C++ and it is primarily used directly by users as a serverless
database and query system rather than
- as a library for building such database systems.
+- [DuckDB](http://www.duckdb.org/) is an open source, in-process analytic
database. Like DataFusion, it supports
+ very fast execution, both from its custom file format and directly from
Parquet files. Unlike DataFusion, it is
+ written in C/C++ and it is primarily used directly by users as a serverless
database and query system rather than
+ as a library for building such database systems.
-- [Polars](http://pola.rs/) is one of the fastest DataFrame libraries at the
time of writing. Like DataFusion, it
- is also written in Rust and uses the Apache Arrow memory model, but unlike
DataFusion it does not provide full SQL
- support, nor as many extension points.
+- [Polars](http://pola.rs/) is one of the fastest DataFrame libraries at the
time of writing. Like DataFusion, it
+ is also written in Rust and uses the Apache Arrow memory model, but unlike
DataFusion it does not provide full SQL
+ support, nor as many extension points.
## Example Usage
@@ -110,6 +110,7 @@ See [examples](examples/README.md) for more information.
- [Executing SQL on Polars](./examples/sql-on-polars.py)
- [Executing SQL on Pandas](./examples/sql-on-pandas.py)
+- [Executing SQL on cuDF](./examples/sql-on-cudf.py)
## How to install (from pip)
diff --git a/examples/README.md b/examples/README.md
index ce98600..2c4775e 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -29,21 +29,22 @@ Here is a direct link to the file used in the examples:
### Executing Queries with DataFusion
-- [Query a Parquet file using SQL](./examples/sql-parquet.py)
-- [Query a Parquet file using the DataFrame
API](./examples/dataframe-parquet.py)
-- [Run a SQL query and store the results in a Pandas
DataFrame](./examples/sql-to-pandas.py)
-- [Query PyArrow Data](./examples/query-pyarrow-data.py)
+- [Query a Parquet file using SQL](./sql-parquet.py)
+- [Query a Parquet file using the DataFrame API](./dataframe-parquet.py)
+- [Run a SQL query and store the results in a Pandas
DataFrame](./sql-to-pandas.py)
+- [Query PyArrow Data](./query-pyarrow-data.py)
### Running User-Defined Python Code
-- [Register a Python UDF with DataFusion](./examples/python-udf.py)
-- [Register a Python UDAF with DataFusion](./examples/python-udaf.py)
+- [Register a Python UDF with DataFusion](./python-udf.py)
+- [Register a Python UDAF with DataFusion](./python-udaf.py)
### Substrait Support
-- [Serialize query plans using Substrait](./examples/substrait.py)
+- [Serialize query plans using Substrait](./substrait.py)
### Executing SQL against DataFrame Libraries (Experimental)
-- [Executing SQL on Polars](./examples/sql-on-polars.py)
-- [Executing SQL on Pandas](./examples/sql-on-pandas.py)
+- [Executing SQL on Polars](./sql-on-polars.py)
+- [Executing SQL on Pandas](./sql-on-pandas.py)
+- [Executing SQL on cuDF](./sql-on-cudf.py)
diff --git a/examples/sql-on-cudf.py b/examples/sql-on-cudf.py
index 407cb1f..999756f 100644
--- a/examples/sql-on-cudf.py
+++ b/examples/sql-on-cudf.py
@@ -19,8 +19,6 @@ from datafusion.cudf import SessionContext
ctx = SessionContext()
-ctx.register_parquet(
- "taxi", "/home/jeremy/Downloads/yellow_tripdata_2021-01.parquet"
-)
+ctx.register_parquet("taxi", "yellow_tripdata_2021-01.parquet")
df = ctx.sql("select passenger_count from taxi")
print(df)