[
https://issues.apache.org/jira/browse/ARROW-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446972#comment-16446972
]
ASF GitHub Bot commented on ARROW-2453:
---------------------------------------
xhochy closed pull request #1923: ARROW-2453: [Python] Improve Table column
access
URL: https://github.com/apache/arrow/pull/1923
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):
diff --git a/python/pyarrow/table.pxi b/python/pyarrow/table.pxi
index cbf2a69f7..cbbfe7da8 100644
--- a/python/pyarrow/table.pxi
+++ b/python/pyarrow/table.pxi
@@ -1152,7 +1152,30 @@ cdef class Table:
self._check_nullptr()
return pyarrow_wrap_schema(self.table.schema())
- def column(self, int i):
+ def column(self, i):
+ """
+ Select a column by its column name, or numeric index.
+
+ Parameters
+ ----------
+ i : int or string
+
+ Returns
+ -------
+ pyarrow.Column
+ """
+ if isinstance(i, six.string_types):
+ field_index = self.schema.get_field_index(i)
+ if field_index < 0:
+ raise KeyError("Column {} does not exist in table".format(i))
+ else:
+ return self._column(field_index)
+ elif isinstance(i, six.integer_types):
+ return self._column(i)
+ else:
+ raise TypeError("Index must either be string or integer")
+
+ def _column(self, int i):
"""
Select a column by its numeric index.
diff --git a/python/pyarrow/tests/test_table.py
b/python/pyarrow/tests/test_table.py
index 5303cb219..492cf0089 100644
--- a/python/pyarrow/tests/test_table.py
+++ b/python/pyarrow/tests/test_table.py
@@ -301,6 +301,23 @@ def test_table_from_arrays_invalid_names():
pa.Table.from_arrays(data, names=['a'])
+def test_table_select_column():
+ data = [
+ pa.array(range(5)),
+ pa.array([-10, -5, 0, 5, 10]),
+ pa.array(range(5, 10))
+ ]
+ table = pa.Table.from_arrays(data, names=('a', 'b', 'c'))
+
+ assert table.column('a').equals(table.column(0))
+
+ with pytest.raises(KeyError):
+ table.column('d')
+
+ with pytest.raises(TypeError):
+ table.column(None)
+
+
def test_table_add_column():
data = [
pa.array(range(5)),
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [Python] Improve Table column access
> ------------------------------------
>
> Key: ARROW-2453
> URL: https://issues.apache.org/jira/browse/ARROW-2453
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Affects Versions: 0.9.0
> Reporter: Antoine Pitrou
> Assignee: Kee Chong Tan
> Priority: Major
> Labels: beginner, pull-request-available
> Fix For: 0.10.0
>
>
> Suppose you have a table column named "nulls". Right now, to access it on a
> table, you need to do something like this:
> {code:python}
> >>> table.column(table.schema.get_field_index('nulls'))
> <pyarrow.lib.Column object at 0x7fe4144d2570>
> chunk 0: <pyarrow.lib.NullArray object at 0x7fe3db51b4a8>
> [
> NA,
> NA,
> NA
> ]
> {code}
> Also, if you mistype the column name, instead of getting an error you get an
> arbitrary column:
> {code}
> >>> table.column(table.schema.get_field_index('z'))
> <pyarrow.lib.Column object at 0x7fe3dbd6cc30>
> chunk 0: <pyarrow.lib.Int64Array object at 0x7fe3db54b408>
> [
> 0,
> 1,
> 2
> ]
> {code}
> {{Table.column()}} should accept a string object and return the column with
> the corresponding name. KeyError should be raised if there is no column with
> a such name.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)