AlenkaF commented on issue #45590: URL: https://github.com/apache/arrow/issues/45590#issuecomment-2695037194
The error comes from our Cython layer https://github.com/apache/arrow/blob/0fbf9823542233c5f32c26534c34cc97ce3f0be2/python/pyarrow/table.pxi#L2437 when we search for index of the unique field with [get_field_index()](https://arrow.apache.org/docs/python/generated/pyarrow.Schema.html#pyarrow.Schema.get_field_index). The method returns -1 if the name isn’t found or there are several fields with the given name. Hence the error. The csv reader is supporting duplicated columns but that is not the case for all pyarrow Table methods unfortunately. The workaround would be to supply column names when reading csv files. Or in your case select columns of the resulting table instead of dropping them: ```python >>> table.select([0, 1, 2]) pyarrow.Table date_time: timestamp[s] los+angeles,ca_maxtempC: int64 los+angeles,ca_mintempC: int64 los+angeles,ca_totalSnow_cm: double ---- date_time: [[2018-12-11 00:00:00,2018-12-11 03:00:00,2018-12-11 06:00:00,2018-12-11 09:00:00,2018-12-11 12:00:00,...,2019-03-11 09:00:00,2019-03-11 12:00:00,2019-03-11 15:00:00,2019-03-11 18:00:00,2019-03-11 21:00:00]] los+angeles,ca_maxtempC: [[20,20,20,20,20,...,19,19,19,19,19]] los+angeles,ca_mintempC: [[14,14,14,14,14,...,12,12,12,12,12]] los+angeles,ca_totalSnow_cm: [[0,0,0,0,0,...,0,0,0,0,0]] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org