[
https://issues.apache.org/jira/browse/ARROW-17652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dewey Dunnington updated ARROW-17652:
-------------------------------------
Summary: [R] S3 credentials not being picked up on Centos7 (was: [R] R
Arrow install fails at Thrift build step)
> [R] S3 credentials not being picked up on Centos7
> -------------------------------------------------
>
> Key: ARROW-17652
> URL: https://issues.apache.org/jira/browse/ARROW-17652
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 9.0.0
> Environment: Amazon Linux 2, R 4.1, Arrow 9.0.0
> Reporter: Adam Giles
> Priority: Major
> Attachments: error.log
>
>
> We use EC2 machines to read parquet datasets from S3. One user has a problem
> with getting this to work. The initial issue was trying to read parquet files
> from S3 failed, with what looked like an AWS credentials error:
> {code:R}
> Error: IOError: When getting information for key 'X' in bucket 'Y': AWS Error
> [code 15]: No response body.
> {code}
> The AWS CLI doesn't have a permissions problem, nor does the `pyarrow`
> package in Python, and neither do other packages in R (eg `paws`). I assumed
> it was to do with not having a full install of the Arrow package with S3
> support. Trying a few different versions of reinstalling from source all
> failed at a stage building Thrift, which in turn seemed to be related to not
> finding Boost libraries -- see the attached error log.
> Usual approach we use, which worked on other machines
> {code:R}
> Sys.setenv("LIBARROW_MINIMAL" = FALSE)
> install.packages("arrow")
> {code}
> Trying to use an Arrow binary, I think probably not the right way, and it's
> just falling back to building from source.
> {code:R}
> Sys.setenv("LIBARROW_BINARY" = TRUE) # I'm not sure this is actually the
> correct use
> Sys.setenv("LIBARROW_MINIMAL" = FALSE)
> install.packages("arrow")
> {code}
> Following some advice in another issue I now can't find, I tried specifying
> the compilers
> {code:R}
> Sys.setenv(CC="/usr/bin/gcc")
> Sys.setenv(CXX="/usr/bin/g++")
> Sys.setenv(LIBARROW_MINIMAL="FALSE")
> Sys.setenv(LIBARROW_BINARY="FALSE")
> Sys.setenv(ARROW_R_DEV="TRUE")
> install.packages("arrow")
> {code}
> We've also tried getting the missing dependencies from `yum`:
> {code:sh}
> yum install -y libcurl-devel
> yum install -y openssl-devel
> yum install -y thrift
> yum install -y boost boost-thread boost-devel
> {code}
> But are still failing at the same point.
> If we install from the RSPM repo for Centos7 (We're running Amazon Linux 2),
> the package installs seemingly happily but we still have the AWS Error
> message when trying to read from an S3 bucket.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)