sacundim opened a new issue, #3744: URL: https://github.com/apache/arrow-rs/issues/3744
# Describe the bug Starting with version 0.6.1 of [csv2parquet](https://github.com/domoritz/csv2parquet) (which upgraded its arrow-rs dependency from 24.0 to 30.0.1), input CSV files that were converting just fine are now failing with errors like this one: ``` Error: External(ParseError("Error while parsing value 2020-03-19 00:00:00 for column 4 at line 2")) ``` The csv2parquet tool is a very thin wrapper around arrow-rs that basically 1. Calls `arrow::csv::reader::infer_file_schema` to infer a schema for the input file; 2. Builds an `arrow::csv::Reader` from that schema and uses it to read the file; 3. Outputs a Parquet file. The tool has a command line option that prints out the inferred schemas, and input values like that are being inferred into Date64: ``` { "name": "Sample Date", "data_type": "Utf8", "nullable": false, "dict_id": 0, "dict_is_ordered": false } ``` And I've written a unit test case that demonstrates that 1. Arrow-rs is inferring Date64 type for input values like "2020-03-19 00:00:00" 2. ...but attempting to parse them as Date64 fails # To Reproduce I wrote a couple of very simple test cases illustrating the problem, which I will shortly edit this report to link. # Expected behavior I see that the various Timestamp types are able to parse strings like that correctly, so I don't understand whether the more correct behavior for this library would be to infer a Timestamp type for these strings instead of the Date64 type. It does seem clear to me that however that it should be possible to parse these strings as Date64. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
