[
https://issues.apache.org/jira/browse/ARROW-16680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575261#comment-17575261
]
Dewey Dunnington commented on ARROW-16680:
------------------------------------------
Thanks for keeping on this!
FWIW, the deletion that seems to cause the sigpipe happens here:
https://github.com/aws/aws-sdk-cpp/blob/main/aws-cpp-sdk-core/source/http/curl/CurlHandleContainer.cpp#L25-L33
...and there is a way to disable sigpipe errors that was broken at one point:
https://github.com/curl/curl/issues/3138 . That issue described a race
condition that happens when objects get deleted that triggers a sigpipe, which
seems consistent with what you're seeing (intermittent failure coming from a
deleter).
That fix looks like it was in CURL 7.62.0, and Ubuntu focal is at 7.68.0 at
least (and you're running on newer Ubuntu than that).
It *seems* like adding {{curl_easy_setopt(curl, CURLOPT_NOSIGNAL);}} right here
https://github.com/aws/aws-sdk-cpp/blob/main/aws-cpp-sdk-core/source/http/curl/CurlHandleContainer.cpp#L89
might work?
> [R] Weird R error: Error in
> fs___FileSystem__GetTargetInfos_FileSelector(self, x) : ignoring SIGPIPE
> signal
> --------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-16680
> URL: https://issues.apache.org/jira/browse/ARROW-16680
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 8.0.0
> Reporter: Carl Boettiger
> Priority: Major
>
> Okay apologies, this is a bit of a weird error but is annoying the heck out
> of me. The following block of all R code, when run with Rscript (or embedded
> into any form of Rmd, quarto, knitr doc) produces the error below (at least
> most of the time):
>
> {code:java}
> library(arrow)
> library(dplyr){code}
> {code:java}
> Sys.setenv(AWS_EC2_METADATA_DISABLED = "TRUE")
> Sys.unsetenv("AWS_ACCESS_KEY_ID")
> Sys.unsetenv("AWS_SECRET_ACCESS_KEY")
> Sys.unsetenv("AWS_DEFAULT_REGION")
> Sys.unsetenv("AWS_S3_ENDPOINT")s3 <- arrow::s3_bucket(bucket =
> "scores/parquet",
> endpoint_override = "data.ecoforecast.org")
> ds <- arrow::open_dataset(s3, partitioning = c("theme", "year"))
> ds |> dplyr::filter(theme == "phenology") |> dplyr::collect()
> {code}
> Gives the error
>
>
> {code:java}
> Error in fs___FileSystem__GetTargetInfos_FileSelector(self, x) :
> ignoring SIGPIPE signal
> Calls: %>% ... <Anonymous> -> fs___FileSystem__GetTargetInfos_FileSelector
> {code}
> But only when run as a script! When run interactively in an R console, this
> code runs just fine. Even as a script the code seems to run fine, but
> erroneously seems to be attempting this sigpipe I don't understand.
> If the script is executed with litter
> ([https://dirk.eddelbuettel.com/code/littler.html)] then it runs fine, since
> littler handles sigpipe but Rscripts don't. But I have no idea why the above
> code throws a pipe in the first place. Worse, if I choose a different filter
> for the above, like "aquatics", it (usually) works without the error.
> I have no idea why `fs___FileSystem__GetTargetInfos_FileSelector` results in
> this, but would really appreciate any hints on how to avoid this as it makes
> it very hard to use arrow in workflows right now!
>
> thanks for all you do!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)