[
https://issues.apache.org/jira/browse/ARROW-17002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563958#comment-17563958
]
Will Jones commented on ARROW-17002:
------------------------------------
Thanks for reporting this! I was looking what I think is a similar issue
earlier with 7.0.0, but was finding that I couldn't reproduce it in 8.0.0.
> R dplyr queries create locks on FileSystemDataset files
> -------------------------------------------------------
>
> Key: ARROW-17002
> URL: https://issues.apache.org/jira/browse/ARROW-17002
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 8.0.0
> Reporter: Adam Black
> Priority: Minor
>
> I think that dplyr queries on FileSystemDataset objects will create locks
> that persist unnecessarily. This issue only seems to occur on Windows. I'm
> using Windows 10. Calling the garbage collector after the dplyr query seems
> to release the lock.
> {code:r}
> library(arrow)
> library(dplyr)
> # I can delete an arrow dataset that has been opened
> write_dataset(iris, "iris")
> ds <- open_dataset("iris")
> file.exists("iris")
> #> [1] TRUE
> print(unlink("iris", recursive = T))
> #> [1] 0
> file.exists("iris")
> #> [1] FALSE
> # However if I run a dplyr query on the data before deleting it the file is
> locked.
> write_dataset(iris, "iris")
> ds <- open_dataset("iris")
> file.exists("iris")
> #> [1] TRUE
> # I think this adds a lock that is not removed
> ds %>% count() %>% collect()
> #> # A tibble: 1 x 1
> #> n
> #> <int>
> #> 1 150
> print(unlink("iris", recursive = T))
> #> [1] 1
> file.exists("iris")
> #> [1] TRUE
> print(unlink("iris", recursive = T, force = T))
> #> [1] 1
> file.exists("iris")
> #> [1] TRUE
> file.remove("iris/part-0.parquet")
> #> Warning in file.remove("iris/part-0.parquet"): cannot remove file 'iris/
> #> part-0.parquet', reason 'Permission denied'
> #> [1] FALSE
> # running gc() will clean up the lock and allow the file to be deleted
> gc()
> #> used (Mb) gc trigger (Mb) max used (Mb)
> #> Ncells 1178845 63 2349652 125.5 1656436 88.5
> #> Vcells 2093715 16 8388608 64.0 3170844 24.2
> print(unlink("iris", recursive = T))
> #> [1] 0
> file.exists("iris")
> #> [1] FALSE
> sessioninfo::session_info()
> #> - Session info
> ---------------------------------------------------------------
> #> setting value
> #> version R version 4.0.5 (2021-03-31)
> #> os Windows 10 x64
> #> system x86_64, mingw32
> #> ui RTerm
> #> language (EN)
> #> collate English_United States.1252
> #> ctype English_United States.1252
> #> tz America/New_York
> #> date 2022-07-07
> #>
> #> - Packages
> -------------------------------------------------------------------
> #> package * version date lib source
> #> arrow * 8.0.0 2022-05-09 [1] CRAN (R 4.0.5)
> #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.5)
> #> backports 1.4.0 2021-11-23 [1] CRAN (R 4.0.5)
> #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.5)
> #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.5)
> #> cli 3.0.1 2021-07-17 [1] CRAN (R 4.0.5)
> #> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.0.5)
> #> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.0.5)
> #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.5)
> #> dplyr * 1.0.8 2022-02-08 [1] CRAN (R 4.0.5)
> #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
> #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.5)
> #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.0.5)
> #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.5)
> #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.5)
> #> generics 0.1.2 2022-01-31 [1] CRAN (R 4.0.5)
> #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.5)
> #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.5)
> #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.0.5)
> #> knitr 1.36 2021-09-29 [1] CRAN (R 4.0.5)
> #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.0.5)
> #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.5)
> #> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.0.5)
> #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.5)
> #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.5)
> #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.0.5)
> #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.0.5)
> #> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.0.5)
> #> rmarkdown 2.10 2021-08-06 [1] CRAN (R 4.0.5)
> #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.5)
> #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.5)
> #> stringi 1.7.5 2021-10-04 [1] CRAN (R 4.0.5)
> #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.5)
> #> styler 1.5.1 2021-07-13 [1] CRAN (R 4.0.5)
> #> tibble 3.1.2 2021-05-16 [1] CRAN (R 4.0.5)
> #> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.0.5)
> #> tzdb 0.2.0 2021-10-27 [1] CRAN (R 4.0.5)
> #> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.5)
> #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5)
> #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.0.5)
> #> xfun 0.25 2021-08-06 [1] CRAN (R 4.0.5)
> #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.5)
> #>
> #> [1] C:/Users/adam.DESKTOP-D3KQQA1/Documents/R/win-library/4.0
> #> [2] C:/Program Files/R/R-4.0.5/library
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)