[ 
https://issues.apache.org/jira/browse/ARROW-17652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dewey Dunnington updated ARROW-17652:
-------------------------------------
    Summary: [R] S3 credentials not being picked up on Centos7  (was: [R] R 
Arrow install fails at Thrift build step)

> [R] S3 credentials not being picked up on Centos7
> -------------------------------------------------
>
>                 Key: ARROW-17652
>                 URL: https://issues.apache.org/jira/browse/ARROW-17652
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 9.0.0
>         Environment: Amazon Linux 2, R 4.1, Arrow 9.0.0
>            Reporter: Adam Giles
>            Priority: Major
>         Attachments: error.log
>
>
> We use EC2 machines to read parquet datasets from S3. One user has a problem 
> with getting this to work. The initial issue was trying to read parquet files 
> from S3 failed, with what looked like an AWS credentials error:
> {code:R}
> Error: IOError: When getting information for key 'X' in bucket 'Y': AWS Error 
> [code 15]: No response body.
> {code}
> The AWS CLI doesn't have a permissions problem, nor does the `pyarrow` 
> package in Python, and neither do other packages in R (eg `paws`). I assumed 
> it was to do with not having a full install of the Arrow package with S3 
> support. Trying a few different versions of reinstalling from source all 
> failed at a stage building Thrift, which in turn seemed to be related to not 
> finding Boost libraries -- see the attached error log.
> Usual approach we use, which worked on other machines
> {code:R}
> Sys.setenv("LIBARROW_MINIMAL" = FALSE)
> install.packages("arrow")
> {code}
> Trying to use an Arrow binary, I think probably not the right way, and it's 
> just falling back to building from source.
> {code:R}
> Sys.setenv("LIBARROW_BINARY" = TRUE) # I'm not sure this is actually the 
> correct use
> Sys.setenv("LIBARROW_MINIMAL" = FALSE)
> install.packages("arrow")
> {code}
> Following some advice in another issue I now can't find, I tried specifying 
> the compilers
> {code:R}
> Sys.setenv(CC="/usr/bin/gcc")
> Sys.setenv(CXX="/usr/bin/g++")
> Sys.setenv(LIBARROW_MINIMAL="FALSE")
> Sys.setenv(LIBARROW_BINARY="FALSE")
> Sys.setenv(ARROW_R_DEV="TRUE")
> install.packages("arrow")
> {code}
> We've also tried getting the missing dependencies from `yum`:
> {code:sh}
> yum install -y libcurl-devel
> yum install -y openssl-devel
> yum install -y thrift
> yum install -y boost boost-thread boost-devel
> {code}
> But are still failing at the same point.
> If we install from the RSPM repo for Centos7 (We're running Amazon Linux 2), 
> the package installs seemingly happily but we still have the AWS Error 
> message when trying to read from an S3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to