jorisvandenbossche commented on issue #35661:
URL: https://github.com/apache/arrow/issues/35661#issuecomment-1559745327
The behaviour you notice is indeed from casting what has been read/parsed as
a float afterwards to string. However, if you use pyarrow's csv reader directly
and using the column_types argument, this is done properly:
```
>>> from pyarrow import csv
>>> csv.read_csv("bug.csv")
pyarrow.Table
user_id: double
value: int64
----
user_id: [[1225717802.1679842]]
value: [[33]]
>>> csv.read_csv("bug.csv",
convert_options=csv.ConvertOptions(column_types={"user_id": pa.string()}))
pyarrow.Table
user_id: string
value: int64
----
user_id: [["1225717802.1679841607"]]
value: [[33]]
```
So I assume this is actually a bug in pandas after all (in how pandas
integrates with the pyarrow csv reader, and how it translates its own arguments
to arguments passed to pyarrow). Therefore closing this issue, and will re-open
the one on the pandas side (https://github.com/pandas-dev/pandas/issues/53269)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]