Re: [I] [Python][CSV] Сannot remove columns with the same name from a table [arrow]

via GitHub Mon, 03 Mar 2025 09:33:31 -0800


AlenkaF commented on issue #45590:
URL: https://github.com/apache/arrow/issues/45590#issuecomment-2695037194


   The error comes from our Cython layer 
https://github.com/apache/arrow/blob/0fbf9823542233c5f32c26534c34cc97ce3f0be2/python/pyarrow/table.pxi#L2437
 when we search for index of the unique field with 
[get_field_index()](https://arrow.apache.org/docs/python/generated/pyarrow.Schema.html#pyarrow.Schema.get_field_index).
 The method returns -1 if the name isn’t found or there are several fields with 
the given name. Hence the error.
   
   The csv reader is supporting duplicated columns but that is not the case for 
all pyarrow Table methods unfortunately.
   
   The workaround would be to supply column names when reading csv files. Or in 
your case select columns of the resulting table instead of dropping them:
   
   ```python
   >>> table.select([0, 1, 2])
   pyarrow.Table
   date_time: timestamp[s]
   los+angeles,ca_maxtempC: int64
   los+angeles,ca_mintempC: int64
   los+angeles,ca_totalSnow_cm: double
   ----
   date_time: [[2018-12-11 00:00:00,2018-12-11 03:00:00,2018-12-11 
06:00:00,2018-12-11 09:00:00,2018-12-11 12:00:00,...,2019-03-11 
09:00:00,2019-03-11 12:00:00,2019-03-11 15:00:00,2019-03-11 18:00:00,2019-03-11 
21:00:00]]
   los+angeles,ca_maxtempC: [[20,20,20,20,20,...,19,19,19,19,19]]
   los+angeles,ca_mintempC: [[14,14,14,14,14,...,12,12,12,12,12]]
   los+angeles,ca_totalSnow_cm: [[0,0,0,0,0,...,0,0,0,0,0]]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [Python][CSV] Сannot remove columns with the same name from a table [arrow]

Reply via email to