Will Jones created ARROW-11095:
----------------------------------
Summary: [Python] Access pyarrow.RecordBatch column by name
Key: ARROW-11095
URL: https://issues.apache.org/jira/browse/ARROW-11095
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Will Jones
I propose adding support for selecting a column out of a pyarrow.RecordBatch
using both __getitem__() and .field(), like we have in pyarrow.Table.
pyarrow.RecordBatch has a pretty similar API to pyarrow.Table (e.g. both have
filter and take methods and a schema), but I got tripped up on this difference.
pyarrow.Table supports accessing columns by name using both __getitem__ and
.field():
{code:python}
my_array = pa.array(range(10))
table = pa.Table.from_arrays([my_array], names=['my_column'])
// Both of these work on table:
table['my_column']
table.field('my_column')
{code}
Meanwhile pyarrow.RecordBatch doesn't support either of those. In fact, I had a
hard time finding a way to grab a column by name from a recordbatch without
first looking up the integer index.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)