[jira] [Created] (ARROW-6231) [Python] Consider assigning default column names when reading CSV file and header_rows=0

Wes McKinney (JIRA) Tue, 13 Aug 2019 20:03:16 -0700

Wes McKinney created ARROW-6231:
-----------------------------------

             Summary: [Python] Consider assigning default column names when 
reading CSV file and header_rows=0
                 Key: ARROW-6231
                 URL: https://issues.apache.org/jira/browse/ARROW-6231
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Wes McKinney
             Fix For: 0.15.0



This is a slight usability rough edge. Assigning default names (like "f0, f1, 
...") would probably be better since then at least you can see how many columns 
there are and what is in them. 

{code}
In [10]: parse_options = csv.ParseOptions(delimiter='|', header_rows=0)         
                                                                                

In [11]: %time table = csv.read_csv('Performance_2016Q4.txt', 
parse_options=parse_options)                                                    
                  
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<timed exec> in <module>

~/miniconda/envs/pyarrow-14-1/lib/python3.7/site-packages/pyarrow/_csv.pyx in 
pyarrow._csv.read_csv()

~/miniconda/envs/pyarrow-14-1/lib/python3.7/site-packages/pyarrow/error.pxi in 
pyarrow.lib.check_status()

ArrowInvalid: header_rows == 0 needs explicit column names
{code}

In pandas integers are used, so some kind of default string would have to be 
defined

{code}
In [18]: df = pd.read_csv('Performance_2016Q4.txt', sep='|', header=None, 
low_memory=False)                                                               
      

In [19]: df.columns                                                             
                                                                                
Out[19]: 
Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
           dtype='int64')
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (ARROW-6231) [Python] Consider assigning default column names when reading CSV file and header_rows=0

Reply via email to