[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12153: ARROW-15338: [Python] Add `pyarrow.orc.read_table` API

GitBox Wed, 19 Jan 2022 11:05:19 -0800


jorisvandenbossche commented on a change in pull request #12153:
URL: https://github.com/apache/arrow/pull/12153#discussion_r788056019




##########
File path: python/pyarrow/orc.py
##########
@@ -175,3 +176,33 @@ def write_table(table, where):
     writer = ORCWriter(where)
     writer.write(table)
     writer.close()
+
+
+def read_table(source, columns=None, filesystem=None):
+    """
+    Read a table from ORC format
+
+    Parameters
+    ----------
+    source : str, pyarrow.NativeFile, or file-like object
+        If a string passed, can be a single file name or directory name. For
+        file-like objects, only read a single file. Use pyarrow.BufferReader to
+        read a file contained in a bytes or buffer-like object.
+    columns : list
+        If not None, only these columns will be read from the file. A column
+        name may be a prefix of a nested field, e.g. 'a' will select 'a.b',
+        'a.c', and 'a.d.e'. If empty, no columns will be read. Note
+        that the table will still have the correct num_rows set despite having
+        no columns.
+    filesystem : FileSystem, default None
+        If nothing passed, paths assumed to be found in the local on-disk
+        filesystem.

Review comment:
       FYI, for this documentation issue, I already created a JIRA -> 
https://issues.apache.org/jira/browse/ARROW-15364 (it's also outdated in the 
parquet docs)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12153: ARROW-15338: [Python] Add `pyarrow.orc.read_table` API

Reply via email to