[ 
https://issues.apache.org/jira/browse/ARROW-17444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riaz Arbi updated ARROW-17444:
------------------------------
    Description: 
Hello,

I encountered this issue because it breaks my tests when I run
{code:java}
rhub::check_for_cran(){code}
Because of this, I know it only affects Windows, all other OS checks pass.

 

If you write files to a directory using arrow's 
{code:java}
write_*{code}
 functions, and then 
{code:java}
collect(open_dataset(directory)){code}
 

 you cannot delete a file in the directory, you get an error. This is best 
demonstrated in a reprex:

 
{code:java}
# setup ------------------------------------------------------------------------
local_prefix <- tempfile()
df <- data.frame(a = 1:5, b = letters[1:5])
# works ------------------------------------------------------------------------
fs <- LocalFileSystem$create()
fs$CreateDir(local_prefix)
fsdir <- fs$cd(local_prefix)
write_parquet(df, fsdir$path("1.parquet"))
#open_dataset(local_prefix) %>% collect()
fsdir$DeleteFile("1.parquet")
unlink(local_prefix, recursive = TRUE)
# doesn't work -----------------------------------------------------------------
fs <- LocalFileSystem$create()
fs$CreateDir(local_prefix)
fsdir <- fs$cd(local_prefix)
write_parquet(df, fsdir$path("1.parquet"))
open_dataset(local_prefix) %>% collect()
fsdir$DeleteFile("1.parquet")
unlink(local_prefix, recursive = TRUE)
 
 
{code}
 

Here is the error I keep getting:

 
{code:java}
Error: IOError: Cannot delete file 
'C:/Users/riaz/AppData/Local/Temp/Rtmp8qUlcx/file233c22f923d0/1.parquet'. 
Detail: [Windows error 32] The process cannot access the file because it is 
being used by another process.
{code}
 

Note that
 * I *{*}do not create an object from the `open_dataset` function{*}*. I simply 
call it.
 * I also call `collect` in order to pull the data. So I cannot see why the 
connection to the file should exist after collect is called
 * my environment pane looks identical in both instances.
 * I do not need to restart R to delete the file. I can simply clear all 
objects from the workspace (rm(list = ls()) and then it works fine.

  was:
Hello,

I encountered this issue because it breaks my tests when I run 
{code:java}
rhub::check_for_cran(){code}
Because of this, I know it only affects Windows, all other OS checks pass.

 

If you write files to a directory using arrow's 
{code:java}
write_*{code}
 functions, and then 
{code:java}
collect(open_dataset(directory)){code}
 you cannot delete a file in the directory, you get an error. This is best 
demonstrated in a reprex:

 

 
{code:java}
# setup ------------------------------------------------------------------------
local_prefix <- tempfile()
df <- data.frame(a = 1:5, b = letters[1:5])
# works ------------------------------------------------------------------------
fs <- LocalFileSystem$create()
fs$CreateDir(local_prefix)
fsdir <- fs$cd(local_prefix)
write_parquet(df, fsdir$path("1.parquet"))
#open_dataset(local_prefix) %>% collect()
fsdir$DeleteFile("1.parquet")
unlink(local_prefix, recursive = TRUE)
# doesn't work -----------------------------------------------------------------
fs <- LocalFileSystem$create()
fs$CreateDir(local_prefix)
fsdir <- fs$cd(local_prefix)
write_parquet(df, fsdir$path("1.parquet"))
open_dataset(local_prefix) %>% collect()
fsdir$DeleteFile("1.parquet")
unlink(local_prefix, recursive = TRUE)
 
 
{code}
 

Here is the error I keep getting:

 
{code:java}
Error: IOError: Cannot delete file 
'C:/Users/riaz/AppData/Local/Temp/Rtmp8qUlcx/file233c22f923d0/1.parquet'. 
Detail: [Windows error 32] The process cannot access the file because it is 
being used by another process.
{code}
 

Note that
 * I **do not create an object from the `open_dataset` function**. I simply 
call it.
 * I also call `collect` in order to pull the data. So I cannot see why the 
connection to the file should exist after collect is called
 * my environment pane looks identical in both instances.
 * I do not need to restart R to delete the file. I can simply clear all 
objects from the workspace (rm(list = ls()) and then it works fine.


> Windows Only: Cannot delete file previously accesed with open_dataset
> ---------------------------------------------------------------------
>
>                 Key: ARROW-17444
>                 URL: https://issues.apache.org/jira/browse/ARROW-17444
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 8.0.0, 9.0.0, 8.0.1
>         Environment: Windows 10
> R 4.2.1
> RStudio 22.07.1
> Arrow 9.0 (fails on arrow 8 as well)
>            Reporter: Riaz Arbi
>            Priority: Major
>
> Hello,
> I encountered this issue because it breaks my tests when I run
> {code:java}
> rhub::check_for_cran(){code}
> Because of this, I know it only affects Windows, all other OS checks pass.
>  
> If you write files to a directory using arrow's 
> {code:java}
> write_*{code}
>  functions, and then 
> {code:java}
> collect(open_dataset(directory)){code}
>  
>  you cannot delete a file in the directory, you get an error. This is best 
> demonstrated in a reprex:
>  
> {code:java}
> # setup 
> ------------------------------------------------------------------------
> local_prefix <- tempfile()
> df <- data.frame(a = 1:5, b = letters[1:5])
> # works 
> ------------------------------------------------------------------------
> fs <- LocalFileSystem$create()
> fs$CreateDir(local_prefix)
> fsdir <- fs$cd(local_prefix)
> write_parquet(df, fsdir$path("1.parquet"))
> #open_dataset(local_prefix) %>% collect()
> fsdir$DeleteFile("1.parquet")
> unlink(local_prefix, recursive = TRUE)
> # doesn't work 
> -----------------------------------------------------------------
> fs <- LocalFileSystem$create()
> fs$CreateDir(local_prefix)
> fsdir <- fs$cd(local_prefix)
> write_parquet(df, fsdir$path("1.parquet"))
> open_dataset(local_prefix) %>% collect()
> fsdir$DeleteFile("1.parquet")
> unlink(local_prefix, recursive = TRUE)
>  
>  
> {code}
>  
> Here is the error I keep getting:
>  
> {code:java}
> Error: IOError: Cannot delete file 
> 'C:/Users/riaz/AppData/Local/Temp/Rtmp8qUlcx/file233c22f923d0/1.parquet'. 
> Detail: [Windows error 32] The process cannot access the file because it is 
> being used by another process.
> {code}
>  
> Note that
>  * I *{*}do not create an object from the `open_dataset` function{*}*. I 
> simply call it.
>  * I also call `collect` in order to pull the data. So I cannot see why the 
> connection to the file should exist after collect is called
>  * my environment pane looks identical in both instances.
>  * I do not need to restart R to delete the file. I can simply clear all 
> objects from the workspace (rm(list = ls()) and then it works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to