tustvold commented on code in PR #3209:
URL: https://github.com/apache/arrow-rs/pull/3209#discussion_r1033541345


##########
arrow-csv/src/reader.rs:
##########
@@ -70,9 +70,11 @@ lazy_static! {
         .case_insensitive(true)
         .build()
         .unwrap();
-    static ref DATE_RE: Regex = Regex::new(r"^\d{4}-\d\d-\d\d$").unwrap();
+    static ref DATE32_RE: Regex = Regex::new(r"^\d{4}-\d\d-\d\d$").unwrap();
+    static ref DATE64_RE: Regex =
+        Regex::new(r"^\d{4}-\d\d-\d\d(T|\s)\d\d:\d\d:\d\d$").unwrap();
     static ref DATETIME_RE: Regex =
-        Regex::new(r"^\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d$").unwrap();
+        
Regex::new(r"^\d{4}-\d\d-\d\d(T|\s)\d\d:\d\d:\d\d(.\d{1,9})?$").unwrap();

Review Comment:
   ```suggestion
           Regex::new(r"^\d{4}-\d\d-\d\d[T 
]\d\d:\d\d:\d\d(.\d{1,9})?$").unwrap();
   ```
   
   Or at the very least a non-capturing group. I also think it should probably 
be ` ` instead of `\s` as things like `\n` or `\t` I don't think would parse 
correctly.



##########
arrow-csv/src/reader.rs:
##########
@@ -90,10 +92,12 @@ fn infer_field_schema(string: &str, datetime_re: 
Option<Regex>) -> DataType {
         DataType::Float64
     } else if INTEGER_RE.is_match(string) {
         DataType::Int64
-    } else if datetime_re.is_match(string) {
-        DataType::Date64
-    } else if DATE_RE.is_match(string) {
+    } else if DATE32_RE.is_match(string) {

Review Comment:
   FWIW I created #3211 to track using a RegexSet here, as the current code is 
rather wasteful, perhaps something for a follow on PR?



##########
arrow-csv/src/reader.rs:
##########
@@ -70,9 +70,11 @@ lazy_static! {
         .case_insensitive(true)
         .build()
         .unwrap();
-    static ref DATE_RE: Regex = Regex::new(r"^\d{4}-\d\d-\d\d$").unwrap();
+    static ref DATE32_RE: Regex = Regex::new(r"^\d{4}-\d\d-\d\d$").unwrap();
+    static ref DATE64_RE: Regex =
+        Regex::new(r"^\d{4}-\d\d-\d\d(T|\s)\d\d:\d\d:\d\d$").unwrap();

Review Comment:
   ```suggestion
           Regex::new(r"^\d{4}-\d\d-\d\d[T ]\d\d:\d\d:\d\d$").unwrap();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to