[orc] branch main updated: ORC-1491: Update Python documentation with PyArrow 13.0.0 and Dask 2023.8.1

dongjoon Wed, 23 Aug 2023 19:56:50 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git



The following commit(s) were added to refs/heads/main by this push:
     new e51756a33 ORC-1491: Update Python documentation with PyArrow 13.0.0 
and Dask 2023.8.1
e51756a33 is described below

commit e51756a3378b239f31d5dd9845acb4b518aedb06
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Wed Aug 23 19:56:40 2023 -0700

    ORC-1491: Update Python documentation with PyArrow 13.0.0 and Dask 2023.8.1
    
    ### What changes were proposed in this pull request?
    
    This PR aims to update Python documentation with the latest PyArrow 13.0.0 
and Dark 2023.8.1.
    
    ### Why are the changes needed?
    
    To recommend to use the latest versions.
    
    ### How was this patch tested?
    
    Manual generate docs because this is a documentation change.
    
    ![Screenshot 2023-08-23 at 10 09 46 
AM](https://github.com/apache/orc/assets/9700541/372b4e36-92aa-41c6-a64d-ac1a9a5cd3c9)
    
    ![Screenshot 2023-08-23 at 10 10 05 
AM](https://github.com/apache/orc/assets/9700541/5102a047-48cb-467e-aa70-1df9298b65a9)
    
    Closes #1597 from dongjoon-hyun/ORC-1491.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 site/_docs/dask.md    | 18 ++++++++++++++----
 site/_docs/pyarrow.md | 11 +++++++++--
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/site/_docs/dask.md b/site/_docs/dask.md
index 963684b08..7719e7d4c 100644
--- a/site/_docs/dask.md
+++ b/site/_docs/dask.md
@@ -9,7 +9,7 @@ permalink: /docs/dask.html
 [Dask](https://dask.org) also supports Apache ORC.
 
 ```
-pip3 install "dask[dataframe]==2022.2.0"
+pip3 install "dask[dataframe]==2023.8.1"
 pip3 install pandas
 ```
 
@@ -20,15 +20,25 @@ In [1]: import pandas as pd
 
 In [2]: import dask.dataframe as dd
 
-In [3]: pf = pd.DataFrame(data={"col1": [1, 2, 3]})
+In [3]: pf = pd.DataFrame(data={"col1": [1, 2, 3], "col2": ["a", "b", None]})
 
 In [4]: dd.to_orc(dd.from_pandas(pf, npartitions=2), path="/tmp/orc")
-Out[4]: (None,)
+Out[4]: (None, None)
 
 In [5]: dd.read_orc(path="/tmp/orc").compute()
 Out[5]:
+   col1  col2
+0     1     a
+1     2     b
+0     3  <NA>
+
+In [6]: dd.read_orc(path="/tmp/orc", columns=["col1"]).compute()
+Out[6]:
    col1
 0     1
 1     2
-2     3
+0     3
 ```
+
+[10 Minutes to Dask](https://docs.dask.org/en/stable/10-minutes-to-dask.html) 
page
+provides a short overview.
diff --git a/site/_docs/pyarrow.md b/site/_docs/pyarrow.md
index f248563f1..fca23797f 100644
--- a/site/_docs/pyarrow.md
+++ b/site/_docs/pyarrow.md
@@ -9,7 +9,7 @@ permalink: /docs/pyarrow.html
 [Apache Arrow](https://arrow.apache.org) project's 
[PyArrow](https://pypi.org/project/pyarrow/) is the recommended package.
 
 ```
-pip3 install pyarrow==12.0.0
+pip3 install pyarrow==13.0.0
 pip3 install pandas
 ```
 
@@ -20,10 +20,17 @@ In [1]: import pyarrow as pa
 
 In [2]: from pyarrow import orc
 
-In [3]: orc.write_table(pa.table({"col1": [1, 2, 3]}), "test.orc", 
compression="zstd")
+In [3]: orc.write_table(pa.table({"col1": [1, 2, 3], "col2": ["a", "b", 
None]}), "test.orc", compression="zstd")
 
 In [4]: orc.read_table("test.orc").to_pandas()
 Out[4]:
+   col1  col2
+0     1     a
+1     2     b
+2     3  None
+
+In [5]: orc.read_table("test.orc", columns=["col1"]).to_pandas()
+Out[5]:
    col1
 0     1
 1     2

[orc] branch main updated: ORC-1491: Update Python documentation with PyArrow 13.0.0 and Dask 2023.8.1

Reply via email to