[
https://issues.apache.org/jira/browse/ARROW-16680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574916#comment-17574916
]
Carl Boettiger commented on ARROW-16680:
----------------------------------------
Hi arrow devs, apologies that this one is hard to write a reprex for, but this
issue is still killing me. The issue happens when running as an external
command – RScript, knit, now quarto as well, for most non-trivial scripts that
touches S3 using arrow. At the moment, the only successful workaround I've
found has been using littler, `r` instead of `RScript`, which understands
sigpipes and thus doesn't error under these conditions. Unfortunately that
does not help for standard workflows that rely on things like quarto or
blogdown or the many other tools in the RStudio markdown ecosystem that all
interpret this as an error.
Here's another attempt at a reprex:
{code:java}
download.file("https://github.com/cboettig/forecasts-darts-framework/raw/main/weather-covariates.qmd",
"sigpipe.qmd")
quarto::quarto_render("sigpipe.qmd")
# ...
#> $ include: logi FALSE
#>
#>
#> [31moutput file: sigpipe.knit.md
#>
#> [39m[31mError: ignoring SIGPIPE signal
#> Execution halted
#> [39m
#> Error in "processx::run(quarto_bin, args, echo = TRUE)": ! System command
'quarto' failed
{code}
<sup>Created on 2022-08-03 by the [reprex
package](https://reprex.tidyverse.org) (v2.0.1)</sup>
> [R] Weird R error: Error in
> fs___FileSystem__GetTargetInfos_FileSelector(self, x) : ignoring SIGPIPE
> signal
> --------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-16680
> URL: https://issues.apache.org/jira/browse/ARROW-16680
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 8.0.0
> Reporter: Carl Boettiger
> Priority: Major
>
> Okay apologies, this is a bit of a weird error but is annoying the heck out
> of me. The following block of all R code, when run with Rscript (or embedded
> into any form of Rmd, quarto, knitr doc) produces the error below (at least
> most of the time):
>
> {code:java}
> library(arrow)
> library(dplyr){code}
> {code:java}
> Sys.setenv(AWS_EC2_METADATA_DISABLED = "TRUE")
> Sys.unsetenv("AWS_ACCESS_KEY_ID")
> Sys.unsetenv("AWS_SECRET_ACCESS_KEY")
> Sys.unsetenv("AWS_DEFAULT_REGION")
> Sys.unsetenv("AWS_S3_ENDPOINT")s3 <- arrow::s3_bucket(bucket =
> "scores/parquet",
> endpoint_override = "data.ecoforecast.org")
> ds <- arrow::open_dataset(s3, partitioning = c("theme", "year"))
> ds |> dplyr::filter(theme == "phenology") |> dplyr::collect()
> {code}
> Gives the error
>
>
> {code:java}
> Error in fs___FileSystem__GetTargetInfos_FileSelector(self, x) :
> ignoring SIGPIPE signal
> Calls: %>% ... <Anonymous> -> fs___FileSystem__GetTargetInfos_FileSelector
> {code}
> But only when run as a script! When run interactively in an R console, this
> code runs just fine. Even as a script the code seems to run fine, but
> erroneously seems to be attempting this sigpipe I don't understand.
> If the script is executed with litter
> ([https://dirk.eddelbuettel.com/code/littler.html)] then it runs fine, since
> littler handles sigpipe but Rscripts don't. But I have no idea why the above
> code throws a pipe in the first place. Worse, if I choose a different filter
> for the above, like "aquatics", it (usually) works without the error.
> I have no idea why `fs___FileSystem__GetTargetInfos_FileSelector` results in
> this, but would really appreciate any hints on how to avoid this as it makes
> it very hard to use arrow in workflows right now!
>
> thanks for all you do!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)