[
https://issues.apache.org/jira/browse/ARROW-16680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575264#comment-17575264
]
Neal Richardson commented on ARROW-16680:
-----------------------------------------
Is it possible for [~cboettig] to set that curl option outside of the aws-sdk
(like the curl timeout issue) and prove out that it works?
If so, at a minimum we could introduce a patch step in our aws-sdk-cpp build to
add that in, as well as upstream the fix (IDK when we'll upgrade to the latest
aws-sdk-cpp even if they do accept the PR).
> [R] Weird R error: Error in
> fs___FileSystem__GetTargetInfos_FileSelector(self, x) : ignoring SIGPIPE
> signal
> --------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-16680
> URL: https://issues.apache.org/jira/browse/ARROW-16680
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 8.0.0
> Reporter: Carl Boettiger
> Priority: Major
>
> Okay apologies, this is a bit of a weird error but is annoying the heck out
> of me. The following block of all R code, when run with Rscript (or embedded
> into any form of Rmd, quarto, knitr doc) produces the error below (at least
> most of the time):
>
> {code:java}
> library(arrow)
> library(dplyr){code}
> {code:java}
> Sys.setenv(AWS_EC2_METADATA_DISABLED = "TRUE")
> Sys.unsetenv("AWS_ACCESS_KEY_ID")
> Sys.unsetenv("AWS_SECRET_ACCESS_KEY")
> Sys.unsetenv("AWS_DEFAULT_REGION")
> Sys.unsetenv("AWS_S3_ENDPOINT")s3 <- arrow::s3_bucket(bucket =
> "scores/parquet",
> endpoint_override = "data.ecoforecast.org")
> ds <- arrow::open_dataset(s3, partitioning = c("theme", "year"))
> ds |> dplyr::filter(theme == "phenology") |> dplyr::collect()
> {code}
> Gives the error
>
>
> {code:java}
> Error in fs___FileSystem__GetTargetInfos_FileSelector(self, x) :
> ignoring SIGPIPE signal
> Calls: %>% ... <Anonymous> -> fs___FileSystem__GetTargetInfos_FileSelector
> {code}
> But only when run as a script! When run interactively in an R console, this
> code runs just fine. Even as a script the code seems to run fine, but
> erroneously seems to be attempting this sigpipe I don't understand.
> If the script is executed with litter
> ([https://dirk.eddelbuettel.com/code/littler.html)] then it runs fine, since
> littler handles sigpipe but Rscripts don't. But I have no idea why the above
> code throws a pipe in the first place. Worse, if I choose a different filter
> for the above, like "aquatics", it (usually) works without the error.
> I have no idea why `fs___FileSystem__GetTargetInfos_FileSelector` results in
> this, but would really appreciate any hints on how to avoid this as it makes
> it very hard to use arrow in workflows right now!
>
> thanks for all you do!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)