[ 
https://issues.apache.org/jira/browse/ARROW-16680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575261#comment-17575261
 ] 

Dewey Dunnington commented on ARROW-16680:
------------------------------------------

Thanks for keeping on this!

FWIW, the deletion that seems to cause the sigpipe happens here: 
https://github.com/aws/aws-sdk-cpp/blob/main/aws-cpp-sdk-core/source/http/curl/CurlHandleContainer.cpp#L25-L33

...and there is a way to disable sigpipe errors that was broken at one point: 
https://github.com/curl/curl/issues/3138 . That issue described a race 
condition that happens when objects get deleted that triggers a sigpipe, which 
seems consistent with what you're seeing (intermittent failure coming from a 
deleter).

That fix looks like it was in CURL 7.62.0, and Ubuntu focal is at 7.68.0 at 
least (and you're running on newer Ubuntu than that).

It *seems* like adding {{curl_easy_setopt(curl, CURLOPT_NOSIGNAL);}} right here 
https://github.com/aws/aws-sdk-cpp/blob/main/aws-cpp-sdk-core/source/http/curl/CurlHandleContainer.cpp#L89
 might work?

> [R] Weird R error: Error in 
> fs___FileSystem__GetTargetInfos_FileSelector(self, x) :    ignoring SIGPIPE 
> signal
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-16680
>                 URL: https://issues.apache.org/jira/browse/ARROW-16680
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 8.0.0
>            Reporter: Carl Boettiger
>            Priority: Major
>
> Okay apologies, this is a bit of a weird error but is annoying the heck out 
> of me.  The following block of all R code, when run with Rscript (or embedded 
> into any form of Rmd, quarto, knitr doc) produces the error below (at least 
> most of the time):
>  
> {code:java}
> library(arrow)
> library(dplyr){code}
> {code:java}
> Sys.setenv(AWS_EC2_METADATA_DISABLED = "TRUE")
> Sys.unsetenv("AWS_ACCESS_KEY_ID")
> Sys.unsetenv("AWS_SECRET_ACCESS_KEY")
> Sys.unsetenv("AWS_DEFAULT_REGION")
> Sys.unsetenv("AWS_S3_ENDPOINT")s3 <- arrow::s3_bucket(bucket = 
> "scores/parquet",
>                        endpoint_override = "data.ecoforecast.org")
> ds <- arrow::open_dataset(s3, partitioning = c("theme", "year"))
> ds |> dplyr::filter(theme == "phenology") |> dplyr::collect()
> {code}
> Gives the error
>  
>  
> {code:java}
> Error in fs___FileSystem__GetTargetInfos_FileSelector(self, x) : 
>   ignoring SIGPIPE signal
> Calls: %>% ... <Anonymous> -> fs___FileSystem__GetTargetInfos_FileSelector 
> {code}
> But only when run as a script! When run interactively in an R console, this 
> code runs just fine.  Even as a script the code seems to run fine, but 
> erroneously seems to be attempting this sigpipe I don't understand.  
> If the script is executed with litter 
> ([https://dirk.eddelbuettel.com/code/littler.html)] then it runs fine, since 
> littler handles sigpipe but Rscripts don't.  But I have no idea why the above 
> code throws a pipe in the first place.  Worse, if I choose a different filter 
> for the above, like "aquatics", it (usually) works without the error.  
> I have no idea why `fs___FileSystem__GetTargetInfos_FileSelector` results in 
> this, but would really appreciate any hints on how to avoid this as it makes 
> it very hard to use arrow in workflows right now! 
>  
> thanks for all you do!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to