davlee1972 commented on issue #46223: URL: https://github.com/apache/arrow/issues/46223#issuecomment-2828452101
That's because the first batch is all 1s so the initial schema it generates for the recordbatchreader is Int64 for the column. It fails to read the batch with "A" in it as a Int64 which happens before cast(). The following works.. Explicitly map column names to column types. Passing this argument disables type inference on the defined columns. ``` with open("/tmp/mixed.csv", "w") as f: f.write("mixed_column\n") f.write("1\n" * 1000000) f.write("A\n") import pyarrow.csv import pyarrow.lib import pyarrow def cast_test(): c_options = pyarrow.csv.ConvertOptions(column_types={"mixed_column": "string"}) print(c_options) with pyarrow.csv.open_csv('/tmp/mixed.csv', convert_options=c_options) as r: for batch in r: print(batch) cast_test() ``` Personally I use pyarrow.dataset() for everything instead of the csv, parquet, etc. classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org