Wes McKinney created ARROW-6231:
-----------------------------------
Summary: [Python] Consider assigning default column names when
reading CSV file and header_rows=0
Key: ARROW-6231
URL: https://issues.apache.org/jira/browse/ARROW-6231
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Wes McKinney
Fix For: 0.15.0
This is a slight usability rough edge. Assigning default names (like "f0, f1,
...") would probably be better since then at least you can see how many columns
there are and what is in them.
{code}
In [10]: parse_options = csv.ParseOptions(delimiter='|', header_rows=0)
In [11]: %time table = csv.read_csv('Performance_2016Q4.txt',
parse_options=parse_options)
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
<timed exec> in <module>
~/miniconda/envs/pyarrow-14-1/lib/python3.7/site-packages/pyarrow/_csv.pyx in
pyarrow._csv.read_csv()
~/miniconda/envs/pyarrow-14-1/lib/python3.7/site-packages/pyarrow/error.pxi in
pyarrow.lib.check_status()
ArrowInvalid: header_rows == 0 needs explicit column names
{code}
In pandas integers are used, so some kind of default string would have to be
defined
{code}
In [18]: df = pd.read_csv('Performance_2016Q4.txt', sep='|', header=None,
low_memory=False)
In [19]: df.columns
Out[19]:
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
dtype='int64')
{code}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)