[ https://issues.apache.org/jira/browse/ARROW-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche updated ARROW-6231: ----------------------------------------- Labels: csv (was: ) > [Python] Consider assigning default column names when reading CSV file and > header_rows=0 > ---------------------------------------------------------------------------------------- > > Key: ARROW-6231 > URL: https://issues.apache.org/jira/browse/ARROW-6231 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Reporter: Wes McKinney > Priority: Major > Labels: csv > Fix For: 0.15.0 > > > This is a slight usability rough edge. Assigning default names (like "f0, f1, > ...") would probably be better since then at least you can see how many > columns there are and what is in them. > {code} > In [10]: parse_options = csv.ParseOptions(delimiter='|', header_rows=0) > > > In [11]: %time table = csv.read_csv('Performance_2016Q4.txt', > parse_options=parse_options) > > --------------------------------------------------------------------------- > ArrowInvalid Traceback (most recent call last) > <timed exec> in <module> > ~/miniconda/envs/pyarrow-14-1/lib/python3.7/site-packages/pyarrow/_csv.pyx in > pyarrow._csv.read_csv() > ~/miniconda/envs/pyarrow-14-1/lib/python3.7/site-packages/pyarrow/error.pxi > in pyarrow.lib.check_status() > ArrowInvalid: header_rows == 0 needs explicit column names > {code} > In pandas integers are used, so some kind of default string would have to be > defined > {code} > In [18]: df = pd.read_csv('Performance_2016Q4.txt', sep='|', header=None, > low_memory=False) > > In [19]: df.columns > > > Out[19]: > Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, > 16, > 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], > dtype='int64') > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)