ablack3 opened a new issue, #14474:
URL: https://github.com/apache/arrow/issues/14474

   I would like to create a fileSystemDataset object in a temp folder, process 
it in batches, and then remove it. This works fine on Linux and Mac but on 
Windows a file lock prevents removal of the temp folder. I think the lock is 
created by arrow and is not released until manually call the garbage collector. 
I don't think I should (or am allowed to) call the garbage collector from 
inside a function that is part of a CRAN hosted R package. So how do I remove 
the file lock in Windows so I can delete the fileSystemDataset?
   
   Reproducible example below.
   
   
   ``` r
   library(arrow)
   
   # create a FileSystemDataset object
   
   filename <- here::here("tmp")
   write_dataset(cars, filename, format = "feather")
   ds <- open_dataset(filename, format = "feather")
   ds
   #> FileSystemDataset with 1 Feather file
   #> speed: double
   #> dist: double
   #> 
   #> See $metadata for additional Schema metadata
   
   # process the file in batches
   scanner <- ScannerBuilder$create(ds)$BatchSize(batch_size = 4)$Finish()
   reader <- scanner$ToRecordBatchReader()
   
   batch_num <- 1
   while(!is.null(batch <- reader$read_next_batch())) {
     print(paste("Reading batch", batch_num, "with", nrow(batch), "rows"))
     batch_num <- batch_num + 1
   }
   #> [1] "Reading batch 1 with 4 rows"
   #> [1] "Reading batch 2 with 4 rows"
   #> [1] "Reading batch 3 with 4 rows"
   #> [1] "Reading batch 4 with 4 rows"
   #> [1] "Reading batch 5 with 4 rows"
   #> [1] "Reading batch 6 with 4 rows"
   #> [1] "Reading batch 7 with 4 rows"
   #> [1] "Reading batch 8 with 4 rows"
   #> [1] "Reading batch 9 with 4 rows"
   #> [1] "Reading batch 10 with 4 rows"
   #> [1] "Reading batch 11 with 4 rows"
   #> [1] "Reading batch 12 with 4 rows"
   #> [1] "Reading batch 13 with 2 rows"
   
   rm(reader)
   rm(scanner)
   rm(ds)
   
   # remove the file
   rc <- unlink(filename, recursive = TRUE)
   if(rc == 1) print("removal of file failed")
   #> [1] "removal of file failed"
   
   file.exists(filename)
   #> [1] TRUE
   
   # call gc()
   gc()
   #>           used (Mb) gc trigger  (Mb) max used (Mb)
   #> Ncells 1115105 59.6    2401181 128.3  1234217 66.0
   #> Vcells 1937727 14.8    8388608  64.0  3294370 25.2
   
   # remove the file
   rc <- unlink(filename, recursive = TRUE)
   if(rc == 1) print("removal of file failed")
   
   file.exists(filename)
   #> [1] FALSE
   
   ```
   
   <sup>Created on 2022-10-19 by the [reprex 
package](https://reprex.tidyverse.org) (v2.0.1)</sup>
   
   <details style="margin-bottom:10px;">
   <summary>
   Session info
   </summary>
   
   ``` r
   sessioninfo::session_info()
   #> - Session info 
---------------------------------------------------------------
   #>  setting  value                       
   #>  version  R version 4.0.5 (2021-03-31)
   #>  os       Windows 10 x64              
   #>  system   x86_64, mingw32             
   #>  ui       RTerm                       
   #>  language (EN)                        
   #>  collate  English_United States.1252  
   #>  ctype    English_United States.1252  
   #>  tz       America/New_York            
   #>  date     2022-10-19                  
   #> 
   #> - Packages 
-------------------------------------------------------------------
   #>  package     * version date       lib source        
   #>  arrow       * 9.0.0.2 2022-10-02 [1] CRAN (R 4.0.5)
   #>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.5)
   #>  backports     1.4.0   2021-11-23 [1] CRAN (R 4.0.5)
   #>  bit           4.0.4   2020-08-04 [1] CRAN (R 4.0.5)
   #>  bit64         4.0.5   2020-08-30 [1] CRAN (R 4.0.5)
   #>  cli           3.0.1   2021-07-17 [1] CRAN (R 4.0.5)
   #>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.0.5)
   #>  DBI           1.1.2   2021-12-20 [1] CRAN (R 4.0.5)
   #>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.5)
   #>  dplyr         1.0.8   2022-02-08 [1] CRAN (R 4.0.5)
   #>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.0.5)
   #>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.5)
   #>  fansi         0.5.0   2021-05-25 [1] CRAN (R 4.0.5)
   #>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.0.5)
   #>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.5)
   #>  generics      0.1.2   2022-01-31 [1] CRAN (R 4.0.5)
   #>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.0.5)
   #>  here          1.0.1   2020-12-13 [1] CRAN (R 4.0.5)
   #>  highr         0.9     2021-04-16 [1] CRAN (R 4.0.5)
   #>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.0.5)
   #>  knitr         1.36    2021-09-29 [1] CRAN (R 4.0.5)
   #>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.0.5)
   #>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.5)
   #>  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.0.5)
   #>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.5)
   #>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.5)
   #>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.0.5)
   #>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.0.5)
   #>  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.0.5)
   #>  rmarkdown     2.10    2021-08-06 [1] CRAN (R 4.0.5)
   #>  rprojroot     2.0.2   2020-11-15 [1] CRAN (R 4.0.5)
   #>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.0.5)
   #>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.5)
   #>  stringi       1.7.5   2021-10-04 [1] CRAN (R 4.0.5)
   #>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.5)
   #>  styler        1.5.1   2021-07-13 [1] CRAN (R 4.0.5)
   #>  tibble        3.1.2   2021-05-16 [1] CRAN (R 4.0.5)
   #>  tidyselect    1.1.2   2022-02-21 [1] CRAN (R 4.0.5)
   #>  tzdb          0.2.0   2021-10-27 [1] CRAN (R 4.0.5)
   #>  utf8          1.2.1   2021-03-12 [1] CRAN (R 4.0.5)
   #>  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.0.5)
   #>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.0.5)
   #>  xfun          0.25    2021-08-06 [1] CRAN (R 4.0.5)
   #>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.5)
   #> 
   #> [1] C:/Users/adam.DESKTOP-D3KQQA1/Documents/R/win-library/4.0
   #> [2] C:/Program Files/R/R-4.0.5/library
   ```
   
   </details>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to