jdcasale opened a new issue, #5648: URL: https://github.com/apache/arrow-rs/issues/5648
**Describe the bug** <!-- A clear and concise description of what the bug is. --> Decimals in scientific notation are frequently expressed with a (admittedly unnecessary) positive exponent specifier, e.g "3.106e+04". [The existing regex ](https://github.com/apache/arrow-rs/blame/master/arrow-csv/src/reader/mod.rs#L151)allows for negative exponent specifiers, but does not recognize a number with a positive specifier. This causes the parser to infer the type of any column with positive exponent specifiers as a Utf8 instead of float. As a sanity check, I tried the same thing in DuckDB, and their csv parser does not make this error. **To Reproduce** <!-- Steps to reproduce the behavior: --> Either attempt to infer schema for a csv file containing the offending pattern (like I have done [here in this provided example](https://github.com/jdcasale/arrow-csv-parse-bug/blob/develop/src/main.rs)) or just run the existing regex directly against the example offender: "3.106e+04", it will not match. **Expected behavior** <!-- A clear and concise description of what you expected to happen. --> The decimal regex recognizes "3.106e+04" as a float value, not a Utf8 string. **Additional context** <!-- Add any other context about the problem here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
