alexkreidler opened a new issue, #3324: URL: https://github.com/apache/arrow-rs/issues/3324
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** <!-- A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] (This section helps Arrow developers understand the context and *why* for this feature, in addition to the *what*) --> I'm writing some code to infer types from a bunch of different CSV files. I ran it on one and got this error: ``` Sampled 100000 lines from ./backup/Book-0.tsv Error: Csv error: Encountered UTF-8 error while reading CSV file: invalid utf-8: invalid UTF-8 in field 4 near byte index 149 ``` because it contained this string `serving as the navy�s liaison` **Describe the solution you'd like** I'd like to be able to pass an additional field to the `ReaderOptions` struct parameter to `infer_reader_schema_with_csv_options`, or better yet `infer_reader_schema` function, and have the library silently continue on non-utf8 values. It could still output their schema type as utf8, or a `NullType`, or even better `BinaryType`. <!-- A clear and concise description of what you want to happen. --> **Describe alternatives you've considered** I could handle this in my code. I imagine there may be many users with non-utf8 CSVs that would still like to pass the data verbatim through Apache Arrow. **Additional context** <!-- Add any other context or screenshots about the feature request here. --> Diving into the code, it looks like we'd need to use `read_byte_record` instead of `read_record` below. I'm not sure the extent of changes this would require in the `arrow-csv` crate. https://github.com/apache/arrow-rs/blob/9e39f96b121d88b7427295bd326d14bb78d0fb39/arrow-csv/src/reader.rs#L487-L499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
