[ 
https://issues.apache.org/jira/browse/ARROW-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506955#comment-17506955
 ] 

Dragoș Moldovan-Grünfeld commented on ARROW-15665:
--------------------------------------------------

I will ask the question here and move it if it is the wrong place / follow-up 
Jira required. In R {{strptime}} returns NA / NULL in the following 
circumstances:
# 1 {{format}} doesn't match {{string}} - e.g. {{"1999-12-31" and "%Y-%d-%M"}} 
# 2 {{string}} doesn't make sense, given the format - e.g. {{string}} is 
{{"this is a string that doesn't make sense"}}
# 3 the {{string}} can be parsed with the given {{format}}, but implicitly 
relies on rollover - {{string}} is {{"1999-02-30"}} and is parsed as the 
{{"1999-03-02"}} Date.

not sure 1 and 2 are actually different, but the 3rd part is different from the 
current Arrow behaviour

{code:r}
library(arrow, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

df <- tibble::tibble(string_date = "1999-02-30")

df %>% 
  mutate(date = strptime(string_date, format = "%Y-%m-%d"))
#> # A tibble: 1 × 2
#>   string_date date  
#>   <chr>       <dttm>
#> 1 1999-02-30  NA

df %>% 
  arrow_table() %>% 
  mutate(date = strptime(string_date, format = "%Y-%m-%d")) %>% 
  collect()
#> # A tibble: 1 × 2
#>   string_date date               
#>   <chr>       <dttm>             
#> 1 1999-02-30  1999-03-02 00:00:00
{code} 

How are things done in Python? Does the R behaviour align with your 
expectations / Is it breaking any ISO Standard?

> [C++] Add error handling option to StrptimeOptions
> --------------------------------------------------
>
>                 Key: ARROW-15665
>                 URL: https://issues.apache.org/jira/browse/ARROW-15665
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Rok Mihevc
>            Assignee: Rok Mihevc
>            Priority: Major
>              Labels: kernel, pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> We want to have an option to either raise, ignore or return NA in case of 
> format mismatch.
> See 
> [pandas.to_datetime|https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html]
>  and lubridate 
> [parse_date_time|https://lubridate.tidyverse.org/reference/parse_date_time.html]
>  for examples.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to