jorgecarleitao commented on a change in pull request #8611:
URL: https://github.com/apache/arrow/pull/8611#discussion_r520319490
##########
File path: rust/arrow/src/csv/reader.rs
##########
@@ -219,6 +226,35 @@ pub fn infer_schema_from_files(
Schema::try_merge(&schemas)
}
+/// Parses a string into the specified `ArrowPrimitiveType`.
+fn parse_field<T: ArrowPrimitiveType>(s: &str) -> Result<T::Native> {
+ let from_ymd = chrono::NaiveDate::from_ymd;
+ let since = chrono::NaiveDate::signed_duration_since;
+
+ match T::DATA_TYPE {
+ DataType::Boolean => s
+ .to_lowercase()
+ .parse::<T::Native>()
+ .map_err(|_| ArrowError::ParseError("Error parsing
boolean".to_string())),
+ DataType::Date32(DateUnit::Day) => {
+ let days = chrono::NaiveDate::parse_from_str(s, "%Y-%m-%d")
+ .map(|t| since(t, from_ymd(1970, 1, 1)).num_days() as i32);
+ days.map(|t| unsafe { std::mem::transmute_copy::<i32,
T::Native>(&t) })
Review comment:
what @vertexclique said: transmute is one of the most unsafe operations
in rust, and this can easily lead to undefined behavior if it overflows.
##########
File path: rust/arrow/src/csv/reader.rs
##########
@@ -67,6 +67,9 @@ lazy_static! {
.case_insensitive(true)
.build()
.unwrap();
+ static ref DATE_RE: Regex = Regex::new(r"^\d\d\d\d-\d\d-\d\d$").unwrap();
Review comment:
isn't there a `\d{4}` or something like that? May make it a bit easier
to read and more expressive, IMO
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]