[
https://issues.apache.org/jira/browse/ARROW-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423719#comment-16423719
]
ASF GitHub Bot commented on ARROW-2014:
---------------------------------------
xhochy closed pull request #1820: ARROW-2014: [Python] Document read_pandas
method in pyarrow.parquet
URL: https://github.com/apache/arrow/pull/1820
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):
diff --git a/python/doc/source/parquet.rst b/python/doc/source/parquet.rst
index 3d01e1d3c..b68d4d85d 100644
--- a/python/doc/source/parquet.rst
+++ b/python/doc/source/parquet.rst
@@ -68,7 +68,8 @@ Let's look at a simple table:
df = pd.DataFrame({'one': [-1, np.nan, 2.5],
'two': ['foo', 'bar', 'baz'],
- 'three': [True, False, True]})
+ 'three': [True, False, True]},
+ index=list('abc'))
table = pa.Table.from_pandas(df)
We write this to Parquet format with ``write_table``:
@@ -94,6 +95,13 @@ the whole file (due to the columnar layout):
pq.read_table('example.parquet', columns=['one', 'three'])
+When reading a subset of columns from a file that used a Pandas dataframe as
the
+source, we use ``read_pandas`` to maintain any additional index column data:
+
+.. ipython:: python
+
+ pq.read_pandas('example.parquet', columns=['two']).to_pandas()
+
We need not use a string to specify the origin of the file. It can be any of:
* A file path as a string
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [Python] Document read_pandas method in pyarrow.parquet
> -------------------------------------------------------
>
> Key: ARROW-2014
> URL: https://issues.apache.org/jira/browse/ARROW-2014
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Wes McKinney
> Assignee: Phillip Cloud
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.10.0
>
>
> see discussion inĀ https://github.com/apache/arrow/issues/1302
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)