Riaz Arbi created ARROW-17444:
---------------------------------
Summary: Windows Only: Cannot delete file previously accesed with
open_dataset
Key: ARROW-17444
URL: https://issues.apache.org/jira/browse/ARROW-17444
Project: Apache Arrow
Issue Type: Bug
Components: R
Affects Versions: 8.0.1, 9.0.0, 8.0.0
Environment: Windows 10
R 4.2.1
RStudio 22.07.1
Arrow 9.0 (fails on arrow 8 as well)
Reporter: Riaz Arbi
Hello,
I encountered this issue because it breaks my tests when I run
{code:java}
rhub::check_for_cran(){code}
Because of this, I know it only affects Windows, all other OS checks pass.
If you write files to a directory using arrow's
{code:java}
write_*{code}
functions, and then
{code:java}
collect(open_dataset(directory)){code}
you cannot delete a file in the directory, you get an error. This is best
demonstrated in a reprex:
{code:java}
# setup ------------------------------------------------------------------------
local_prefix <- tempfile()
df <- data.frame(a = 1:5, b = letters[1:5])
# works ------------------------------------------------------------------------
fs <- LocalFileSystem$create()
fs$CreateDir(local_prefix)
fsdir <- fs$cd(local_prefix)
write_parquet(df, fsdir$path("1.parquet"))
#open_dataset(local_prefix) %>% collect()
fsdir$DeleteFile("1.parquet")
unlink(local_prefix, recursive = TRUE)
# doesn't work -----------------------------------------------------------------
fs <- LocalFileSystem$create()
fs$CreateDir(local_prefix)
fsdir <- fs$cd(local_prefix)
write_parquet(df, fsdir$path("1.parquet"))
open_dataset(local_prefix) %>% collect()
fsdir$DeleteFile("1.parquet")
unlink(local_prefix, recursive = TRUE)
{code}
Here is the error I keep getting:
{code:java}
Error: IOError: Cannot delete file
'C:/Users/riaz/AppData/Local/Temp/Rtmp8qUlcx/file233c22f923d0/1.parquet'.
Detail: [Windows error 32] The process cannot access the file because it is
being used by another process.
{code}
Note that
* I **do not create an object from the `open_dataset` function**. I simply
call it.
* I also call `collect` in order to pull the data. So I cannot see why the
connection to the file should exist after collect is called
* my environment pane looks identical in both instances.
* I do not need to restart R to delete the file. I can simply clear all
objects from the workspace (rm(list = ls()) and then it works fine.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)