thisisnic commented on issue #39811: URL: https://github.com/apache/arrow/issues/39811#issuecomment-1916985149
From the perspective of fixing this, I had a look and : * `read_delim_arrow()` contains a function `readr_to_csv_parse_options()` which takes the readr-style parameters passed in and uses them to set up Arrow-compatible options by doing a few things like converting the `col_types` values into a schema. * In line 803-806 of `csv_convert_options` we have a check that raises an error if the `col_types` parameter passed into it isn't a schema object. Basically, what is happening is that we are not calling `readr_to_csv_parse_options()` and so it's not happening. I think what we need to do here is one of: a) set up this schema manually if we need to. It's probably a change which needs making in the body of `check_csv_file_format_args` where we checking options for validity and setting up the various options classes for reading in datasets. b) call `readr_to_csv_parse_options()` in `check_csv_file_format_args()`, though I'm not convinced this is the right path here, as `open_csv_dataset()` is just a wrapper around `open_dataset(format = "csv")`. The original function `open_dataset()` supports more options than `open_csv_dataset()` and so we might break things if we do this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
