sacundim opened a new issue, #3744:
URL: https://github.com/apache/arrow-rs/issues/3744

   # Describe the bug
   
   Starting with version 0.6.1 of 
[csv2parquet](https://github.com/domoritz/csv2parquet) (which upgraded its 
arrow-rs dependency from 24.0 to 30.0.1), input CSV files that were converting 
just fine are now failing with errors like this one:
   
   ```
   Error: External(ParseError("Error while parsing value 2020-03-19 00:00:00 
for column 4 at line 2"))
   ```
   
   The csv2parquet tool is a very thin wrapper around arrow-rs that basically
   
   1. Calls `arrow::csv::reader::infer_file_schema` to infer a schema for the 
input file;
   2. Builds an `arrow::csv::Reader` from that schema and uses it to read the 
file;
   3. Outputs a Parquet file.
   
   The tool has a command line option that prints out the inferred schemas, and 
input values like that are being inferred into Date64:
   
   ```
       {
         "name": "Sample Date",
         "data_type": "Utf8",
         "nullable": false,
         "dict_id": 0,
         "dict_is_ordered": false
       }
   ```
   
   And I've written a unit test case that demonstrates that 
   
   1. Arrow-rs is inferring Date64 type for input values like "2020-03-19 
00:00:00"
   2. ...but attempting to parse them as Date64 fails
   
   # To Reproduce
   
   I wrote a couple of very simple test cases illustrating the problem, which I 
will shortly edit this report to link.
   
   # Expected behavior
   
   I see that the various Timestamp types are able to parse strings like that 
correctly, so I don't understand whether the more correct behavior for this 
library would be to infer a Timestamp type for these strings instead of the 
Date64 type. It does seem clear to me that however that it should be possible 
to parse these strings as Date64. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to