[ 
https://issues.apache.org/jira/browse/ARROW-18242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629023#comment-17629023
 ] 

Nicola Crane commented on ARROW-18242:
--------------------------------------

OK, my best guess as to what is going on here is that the original lubridate 
implementation uses a custom C parser to process these datetimes, and in the 
Arrow implementation some of this work is being done on whichever external 
library is being depended on for datetimes, which is why there's a difference 
between Windows and Linux.  We might be able to add some additional 
pre-processing steps to our bindings (or the regex the setup code for them 
produces) to prevent this.  

> [R] arrow implementation of lubridate::dmy parses invalid date "00001976" as 
> date
> ---------------------------------------------------------------------------------
>
>                 Key: ARROW-18242
>                 URL: https://issues.apache.org/jira/browse/ARROW-18242
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Lucas Mation
>            Priority: Critical
>
> Sorry for so many issues, but I think this is another bug.
> Wrong behavior of the arrow implementation of the  `lubridate::dmy`.
> An invalid date such as '00001976' is being parsed as a valid (and completely 
> unrelated) date.
> #in R
> '00001976' %>% dmy
> [1] NA
> Warning message:
>   All formats failed to parse. No formats found. 
> #In arrow
> q <- data.table(x=c('00001976','30111976','01011976'))
> q %>% write_dataset('q')
> q2 <- 'q' %>% open_dataset %>% mutate(x2=dmy) %>% collect
> q2
> x
> 1: 1975-11-30
> 2: 1976-11-30
> 3: 1976-01-01
> #notice '00001976' is an invalid date. First row of x2 should be NA!!!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to