This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-1.7
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/branch-1.7 by this push:
new 3e97a42 ORC-1112: Add `Using with Python` web page (#1039)
3e97a42 is described below
commit 3e97a425979252887c7ff3bbbf697934206357cd
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Tue Feb 1 16:24:43 2022 -0800
ORC-1112: Add `Using with Python` web page (#1039)
### What changes were proposed in this pull request?
This PR aims to add `Using with Python` web page to Apache ORC website for
the community Python users.
### Why are the changes needed?
To help Python users to use `Apache Arrow` project more with latest `Apache
ORC 1.7.x C++` release.
### How was this patch tested?
Build the doc and check generated website. The embedded code can be test
with `PyArrow 6.0.1 (latest)` and will be improved at `PyArrow 7.0` via
[ARROW-15338: [Python] Add pyarrow.orc.read_table
API](https://github.com/apache/arrow/commit/ff4b9bea56aeb2c48f19d6137dd2fbae59d618c7)
<img width="581" alt="Screen Shot 2022-02-01 at 2 28 15 PM"
src="https://user-images.githubusercontent.com/9700541/152062188-d9d3309a-9367-49dc-b8ea-0f4bac8d9919.png">
<img width="100%" alt="Screen Shot 2022-02-01 at 2 29 23 PM"
src="https://user-images.githubusercontent.com/9700541/152062356-934b366f-040b-4fa7-8beb-27ae786e028b.png">
This closes #1027
(cherry picked from commit 50ba8bb643a51132849ed61a7462ed5d229812e7)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
site/_data/docs.yml | 4 ++++
site/_docs/pyarrow.md | 37 +++++++++++++++++++++++++++++++++++++
site/index.html | 1 +
3 files changed, 42 insertions(+)
diff --git a/site/_data/docs.yml b/site/_data/docs.yml
index 855abbd..f4354aa 100644
--- a/site/_data/docs.yml
+++ b/site/_data/docs.yml
@@ -11,6 +11,10 @@
- building
- releases
+- title: Using in Python
+ docs:
+ - pyarrow
+
- title: Using in Spark
docs:
- spark-ddl
diff --git a/site/_docs/pyarrow.md b/site/_docs/pyarrow.md
new file mode 100644
index 0000000..a598779
--- /dev/null
+++ b/site/_docs/pyarrow.md
@@ -0,0 +1,37 @@
+---
+layout: docs
+title: PyArrow
+permalink: /docs/pyarrow.html
+---
+
+## How to install
+
+Apache Arrow project's PyArrow is the recommended package.
+
+https://pypi.org/project/pyarrow/
+
+```
+pip3 install pyarrow
+pip3 install pandas
+```
+
+## How to write and read an ORC file
+
+```
+In [1]: import pandas as pd
+
+In [2]: import pyarrow as pa
+
+In [3]: import pyarrow.orc as orc
+
+In [4]: orc.write_table(pa.table({"col1": [1, 2, 3]}), "test.orc")
+
+In [5]: t = orc.ORCFile("test.orc").read()
+
+In [6]: t.to_pandas()
+Out[6]:
+ col1
+0 1
+1 2
+2 3
+```
diff --git a/site/index.html b/site/index.html
index 7b02453..a07f5a7 100644
--- a/site/index.html
+++ b/site/index.html
@@ -44,6 +44,7 @@ overview: true
<div class="unit golden-large code">
<p class="title">Quickstart Documentation</p>
<ul class="shell">
+ <li><a href="docs/pyarrow.html">Using with Python</a></li>
<li><a href="docs/spark-ddl.html">Using with Spark</a></li>
<li><a href="docs/hive-ddl.html">Using with Hive</a></li>
<li><a href="docs/mapred.html">Using with Hadoop MapRed</a></li>