[
https://issues.apache.org/jira/browse/ARROW-16680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583194#comment-17583194
]
Vitalie Spinu commented on ARROW-16680:
---------------------------------------
AWS does set the option
[https://github.com/aws/aws-sdk-cpp/blob/main/aws-cpp-sdk-core/source/http/curl/CurlHandleContainer.cpp#L142]
to 1. The curl [doc|https://curl.se/libcurl/c/CURLOPT_NOSIGNAL.html] is a bit
confusing though:
{noformat}
Setting CURLOPT_NOSIGNAL to 1 makes libcurl NOT ask the system to ignore
SIGPIPE signals, which otherwise are sent by the system when trying to send
data to a socket which is closed in the other end. libcurl makes an effort to
never cause such SIGPIPEs to trigger, but some operating systems have no way to
avoid them and even on those that have there are some corner cases when they
may still happen, contrary to our desire. In addition, using CURLAUTH_NTLM_WB
authentication could cause a SIGCHLD signal to be raised.{noformat}
It looks like there is no way to reliably avoid shose sigpipes. Hence, maybe
the right approach would be to handle sigpies like [plasma code does
it|https://github.com/apache/arrow/blob/3c7a0cad0e25ed66e4c555d9da49f320f803573c/cpp/src/plasma/plasma.h#L52-L67]
?
> [R] Weird R error: Error in
> fs___FileSystem__GetTargetInfos_FileSelector(self, x) : ignoring SIGPIPE
> signal
> --------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-16680
> URL: https://issues.apache.org/jira/browse/ARROW-16680
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 8.0.0
> Reporter: Carl Boettiger
> Priority: Major
>
> Okay apologies, this is a bit of a weird error but is annoying the heck out
> of me. The following block of all R code, when run with Rscript (or embedded
> into any form of Rmd, quarto, knitr doc) produces the error below (at least
> most of the time):
>
> {code:java}
> library(arrow)
> library(dplyr){code}
> {code:java}
> Sys.setenv(AWS_EC2_METADATA_DISABLED = "TRUE")
> Sys.unsetenv("AWS_ACCESS_KEY_ID")
> Sys.unsetenv("AWS_SECRET_ACCESS_KEY")
> Sys.unsetenv("AWS_DEFAULT_REGION")
> Sys.unsetenv("AWS_S3_ENDPOINT")s3 <- arrow::s3_bucket(bucket =
> "scores/parquet",
> endpoint_override = "data.ecoforecast.org")
> ds <- arrow::open_dataset(s3, partitioning = c("theme", "year"))
> ds |> dplyr::filter(theme == "phenology") |> dplyr::collect()
> {code}
> Gives the error
>
>
> {code:java}
> Error in fs___FileSystem__GetTargetInfos_FileSelector(self, x) :
> ignoring SIGPIPE signal
> Calls: %>% ... <Anonymous> -> fs___FileSystem__GetTargetInfos_FileSelector
> {code}
> But only when run as a script! When run interactively in an R console, this
> code runs just fine. Even as a script the code seems to run fine, but
> erroneously seems to be attempting this sigpipe I don't understand.
> If the script is executed with litter
> ([https://dirk.eddelbuettel.com/code/littler.html)] then it runs fine, since
> littler handles sigpipe but Rscripts don't. But I have no idea why the above
> code throws a pipe in the first place. Worse, if I choose a different filter
> for the above, like "aquatics", it (usually) works without the error.
> I have no idea why `fs___FileSystem__GetTargetInfos_FileSelector` results in
> this, but would really appreciate any hints on how to avoid this as it makes
> it very hard to use arrow in workflows right now!
>
> thanks for all you do!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)