This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/main by this push:
new e51756a33 ORC-1491: Update Python documentation with PyArrow 13.0.0
and Dask 2023.8.1
e51756a33 is described below
commit e51756a3378b239f31d5dd9845acb4b518aedb06
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Wed Aug 23 19:56:40 2023 -0700
ORC-1491: Update Python documentation with PyArrow 13.0.0 and Dask 2023.8.1
### What changes were proposed in this pull request?
This PR aims to update Python documentation with the latest PyArrow 13.0.0
and Dark 2023.8.1.
### Why are the changes needed?
To recommend to use the latest versions.
### How was this patch tested?
Manual generate docs because this is a documentation change.


Closes #1597 from dongjoon-hyun/ORC-1491.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
site/_docs/dask.md | 18 ++++++++++++++----
site/_docs/pyarrow.md | 11 +++++++++--
2 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/site/_docs/dask.md b/site/_docs/dask.md
index 963684b08..7719e7d4c 100644
--- a/site/_docs/dask.md
+++ b/site/_docs/dask.md
@@ -9,7 +9,7 @@ permalink: /docs/dask.html
[Dask](https://dask.org) also supports Apache ORC.
```
-pip3 install "dask[dataframe]==2022.2.0"
+pip3 install "dask[dataframe]==2023.8.1"
pip3 install pandas
```
@@ -20,15 +20,25 @@ In [1]: import pandas as pd
In [2]: import dask.dataframe as dd
-In [3]: pf = pd.DataFrame(data={"col1": [1, 2, 3]})
+In [3]: pf = pd.DataFrame(data={"col1": [1, 2, 3], "col2": ["a", "b", None]})
In [4]: dd.to_orc(dd.from_pandas(pf, npartitions=2), path="/tmp/orc")
-Out[4]: (None,)
+Out[4]: (None, None)
In [5]: dd.read_orc(path="/tmp/orc").compute()
Out[5]:
+ col1 col2
+0 1 a
+1 2 b
+0 3 <NA>
+
+In [6]: dd.read_orc(path="/tmp/orc", columns=["col1"]).compute()
+Out[6]:
col1
0 1
1 2
-2 3
+0 3
```
+
+[10 Minutes to Dask](https://docs.dask.org/en/stable/10-minutes-to-dask.html)
page
+provides a short overview.
diff --git a/site/_docs/pyarrow.md b/site/_docs/pyarrow.md
index f248563f1..fca23797f 100644
--- a/site/_docs/pyarrow.md
+++ b/site/_docs/pyarrow.md
@@ -9,7 +9,7 @@ permalink: /docs/pyarrow.html
[Apache Arrow](https://arrow.apache.org) project's
[PyArrow](https://pypi.org/project/pyarrow/) is the recommended package.
```
-pip3 install pyarrow==12.0.0
+pip3 install pyarrow==13.0.0
pip3 install pandas
```
@@ -20,10 +20,17 @@ In [1]: import pyarrow as pa
In [2]: from pyarrow import orc
-In [3]: orc.write_table(pa.table({"col1": [1, 2, 3]}), "test.orc",
compression="zstd")
+In [3]: orc.write_table(pa.table({"col1": [1, 2, 3], "col2": ["a", "b",
None]}), "test.orc", compression="zstd")
In [4]: orc.read_table("test.orc").to_pandas()
Out[4]:
+ col1 col2
+0 1 a
+1 2 b
+2 3 None
+
+In [5]: orc.read_table("test.orc", columns=["col1"]).to_pandas()
+Out[5]:
col1
0 1
1 2