[
https://issues.apache.org/jira/browse/ARROW-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579614#comment-17579614
]
Miles Granger edited comment on ARROW-16421 at 8/15/22 9:46 AM:
----------------------------------------------------------------
[~westonpace] while working on ARROW-13763, it appears that closing files on
C++ side doesn't occur in places one might initially expect. For example
file_ipc.cc
[IpcFileFormat::Inspect|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/cpp/src/arrow/dataset/file_ipc.cc#L134]
could close the `reader` then return the schema, but doesn't. Also in
[SerializedFile::Close|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/cpp/src/parquet/file_reader.cc#L293]
(ParquetFileReader::Contents) doesn't close the file, but deals with
decryption keys. It is possible to add a {{~source_.get()->Close()~}}
immediately after however to close the RandomAccessFile.
Is my understanding correct? If so, it seems like it could be related to this
issue, but suppose there are reasons for not doing such?
was (Author: JIRAUSER293894):
[~westonpace] while working on ARROW-13763, it appears that closing files on
C++ side doesn't occur in places one might initially expect. For example
file_ipc.cc
[IpcFileFormat::Inspect|https://github.com/apache/arrow/blob/bc1a16cd0eceeffe67893a7e8000d2dd28dcf3f1/cpp/src/arrow/dataset/file_ipc.cc#L134]
could close the `reader` then return the schema, but doesn't. Also in
[SerializedFile::Close|#L293] (ParquetFileReader::Contents) doesn't close the
file, but deals with decryption keys. It is possible to add a
{{~source_.get()->Close()~}} immediately after however to close the
RandomAccessFile.
Is my understanding correct? If so, it seems like it could be related to this
issue, but suppose there are reasons for not doing such?
> [R] Permission error on Windows when deleting file in dataset
> -------------------------------------------------------------
>
> Key: ARROW-16421
> URL: https://issues.apache.org/jira/browse/ARROW-16421
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Affects Versions: 7.0.0
> Reporter: Will Jones
> Assignee: Will Jones
> Priority: Major
>
> On Windows this fails:
> {code:R}
> library(arrow)
> write_dataset(iris, "test_dataset")
> # Original example was with DuckDB, but that's not necessarily the issue
> # con <- open_dataset("test_dataset") |> to_duckdb()
> con <- open_dataset("test_dataset")$NewScan()$Finish()$ToRecordBatchReader()
> file.remove("test_dataset/part-0.parquet")
> #> Warning in file.remove("test_dataset/part-0.parquet"): cannot remove file
> #> 'test_dataset/part-0.parquet', reason 'Permission denied'
> #> [1] FALSE
> {code}
> But on MacOS it does not:
> {code:r}
> library(arrow)
> write_dataset(iris, "test_dataset")
> # Original example was with DuckDB, but that's not necessarily the issue
> # con <- open_dataset("test_dataset") |> to_duckdb()
> con <- open_dataset("test_dataset")$NewScan()$Finish()$ToRecordBatchReader()
> file.remove("test_dataset/part-0.parquet")
> #> [1] TRUE
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)