jwijffels opened a new issue, #50009:
URL: https://github.com/apache/arrow/issues/50009
### Describe the bug, including details regarding any error messages,
version, and platform.
Hello.
I'm using the R package arrow (24.0.0) on Ubuntu basically by having it
install with install.packages("arrow") from
https://r2u.stat.illinois.edu/ubuntu which gets the deb file
r-cran-arrow_24.0.0-1.ca2204.1_amd64.deb
In a process which I start with Rscript myprocess.R at the beginning of the
process I use `read_parquet` to read some parquet data on S3 and at the end of
the process I `read_parquet` and next I quit R by using quit(save = "no").
Sometimes the process takes 30 minutes to 40 minutes which works ok. When it
take longer e.g. longer than 42 minutes when R quits, it apparently cleans up
all the resources meaning somewhere arrow is also cleaned up.
If the process takes longer than 42 minutes quitting segfaults at
FinalizeS3. This never happens when the process takes less than this amount of
time.
This is the error message I see in the logs and the R session is killed at
quit.
> s3_write_rds(trans, s3_path(settings$folder_output$raw,
"rawdata.rds"))
[1] "s3://some-whatever-s3-path/rawdata.rds"
> quit(save = "no")
Error in FinalizeS3() : ignoring SIGPIPE signal
Calls: <Anonymous> -> FinalizeS3
*** caught segfault ***
address 0x30, cause 'memory not mapped'
An irrecoverable exception occurred. R is aborting now ...
My guess is that it's stopped due to this:
https://github.com/apache/arrow/blob/62cdda9cb79a5d5e5e188b0aeb42316431eecc85/r/src/filesystem.cpp#L352-L357
What are my options to not make arrow kill R?
### Component(s)
R
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]