Matthew Roeschke created ARROW-17360:
----------------------------------------
Summary: [Python] pyarrow.orc.ORCFile.read does not preserve
ordering of columns
Key: ARROW-17360
URL: https://issues.apache.org/jira/browse/ARROW-17360
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Affects Versions: 8.0.1
Reporter: Matthew Roeschke
xref [https://github.com/pandas-dev/pandas/issues/47944]
{code:java}
In [1]: df = pd.DataFrame({"a": [1, 2, 3], "b": ["a", "b", "c"]})
# pandas main branch / 1.5
In [2]: df.to_orc("abc")
In [3]: pd.read_orc("abc", columns=['b', 'a'])
Out[3]:
a b
0 1 a
1 2 b
2 3 c
In [4]: import pyarrow.orc as orc
In [5]: orc_file = orc.ORCFile("abc")
# reordered to a, b
In [6]: orc_file.read(columns=['b', 'a']).to_pandas()
Out[6]:
a b
0 1 a
1 2 b
2 3 c
# reordered to a, b
In [7]: orc_file.read(columns=['b', 'a'])
Out[7]:
pyarrow.Table
a: int64
b: string
----
a: [[1,2,3]]
b: [["a","b","c"]] {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)