[
https://issues.apache.org/jira/browse/ARROW-18242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lucas Mation updated ARROW-18242:
---------------------------------
Description:
Sorry for so many issues, but I think this is another bug.
Wrong behavior of the arrow implementation of the `lubridate::dmy`.
An invalid date such as '00001976' is being parsed as a valid (and completely
unrelated) date.
#in R
'00001976' %>% dmy
[1] NA
Warning message:
All formats failed to parse. No formats found.
#In arrow
q <- data.table(x=c('00001976','30111976','01011976'))
q %>% write_dataset('q')
q2 <- 'q' %>% open_dataset %>% mutate(x2=dmy) %>% collect
q2
x
1: 1975-11-30
2: 1976-11-30
3: 1976-01-01
#notice '00001976' is an invalid date. First row of x2 should be NA!!!
was:
Sorry for so many issues, but I think this is another bug.
Wrong behavior of the arrow implementation of the `lubridate::dmy`.
An invalid date such as '00001976' is being parsed as a valid (and completely
unrelated) date.
#in R
'00001976' %>% dmy
[1] NA
Warning message:
All formats failed to parse. No formats found.
#In arrow
q <- data.table(x=c('00001976','30111976','01011976'))
q %>% write_dataset(paste0(p2,'/q'))
q2 <- paste0(p2,'/q') %>% open_dataset %>% mutate(x2=dmy(x)) %>% collect
q2
x
1: 1975-11-30
2: 1976-11-30
3: 1976-01-01
#notice '00001976' is an invalid date. First row of x2 should be NA!!!
> [R] arrow implementation of lubridate::dmy parses invalid date "00001976" as
> date
> ---------------------------------------------------------------------------------
>
> Key: ARROW-18242
> URL: https://issues.apache.org/jira/browse/ARROW-18242
> Project: Apache Arrow
> Issue Type: Bug
> Reporter: Lucas Mation
> Priority: Major
>
> Sorry for so many issues, but I think this is another bug.
> Wrong behavior of the arrow implementation of the `lubridate::dmy`.
> An invalid date such as '00001976' is being parsed as a valid (and completely
> unrelated) date.
> #in R
> '00001976' %>% dmy
> [1] NA
> Warning message:
> All formats failed to parse. No formats found.
> #In arrow
> q <- data.table(x=c('00001976','30111976','01011976'))
> q %>% write_dataset('q')
> q2 <- 'q' %>% open_dataset %>% mutate(x2=dmy) %>% collect
> q2
> x
> 1: 1975-11-30
> 2: 1976-11-30
> 3: 1976-01-01
> #notice '00001976' is an invalid date. First row of x2 should be NA!!!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)