This is an automated email from the ASF dual-hosted git repository. thisisnic pushed a commit to branch maint-23.0.1-r in repository https://gitbox.apache.org/repos/asf/arrow.git
commit 8051caf7c4289728eb68eac140eb4a2840ed3e25 Author: Jonathan Keane <[email protected]> AuthorDate: Tue Feb 3 16:20:45 2026 -0600 GH-49067: [R] Disable GCS on macos (#49068) ### Rationale for this change Builds that complete on CRAN ### What changes are included in this PR? Disable GCS by default ### Are these changes tested? ### Are there any user-facing changes? Hopefully not **This PR includes breaking changes to public APIs.** (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.) **This PR contains a "Critical Fix".** (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.) * GitHub Issue: #49067 --------- Co-authored-by: Nic Crane <[email protected]> --- compose.yaml | 4 +- dev/tasks/r/github.packages.yml | 1 - r/tools/nixlibs.R | 2 +- r/vignettes/developers/binary_features.Rmd | 193 +++++++++++++++++++++++++++++ 4 files changed, 197 insertions(+), 3 deletions(-) diff --git a/compose.yaml b/compose.yaml index 31bc5c81b9..8e908975df 100644 --- a/compose.yaml +++ b/compose.yaml @@ -441,7 +441,9 @@ services: ARROW_HOME: /arrow ARROW_DEPENDENCY_SOURCE: BUNDLED LIBARROW_MINIMAL: "false" - ARROW_MIMALLOC: "ON" + # explicitly enable GCS when we build libarrow so that binary libarrow + # users get more fully-featured builds + ARROW_GCS: "ON" volumes: *ubuntu-volumes command: &cpp-static-command /bin/bash -c " diff --git a/dev/tasks/r/github.packages.yml b/dev/tasks/r/github.packages.yml index cedb567f2c..40d3457292 100644 --- a/dev/tasks/r/github.packages.yml +++ b/dev/tasks/r/github.packages.yml @@ -81,7 +81,6 @@ jobs: env: {{ macros.github_set_sccache_envvars()|indent(8) }} MACOSX_DEPLOYMENT_TARGET: "11.6" - ARROW_S3: ON ARROW_GCS: ON ARROW_DEPENDENCY_SOURCE: BUNDLED CMAKE_GENERATOR: Ninja diff --git a/r/tools/nixlibs.R b/r/tools/nixlibs.R index f4ccb4956a..151dd47f5d 100644 --- a/r/tools/nixlibs.R +++ b/r/tools/nixlibs.R @@ -597,7 +597,7 @@ build_libarrow <- function(src_dir, dst_dir) { env_var_list <- c( env_var_list, ARROW_S3 = Sys.getenv("ARROW_S3", "ON"), - ARROW_GCS = Sys.getenv("ARROW_GCS", "ON"), + # ARROW_GCS = Sys.getenv("ARROW_GCS", "ON"), ARROW_WITH_ZSTD = Sys.getenv("ARROW_WITH_ZSTD", "ON") ) } diff --git a/r/vignettes/developers/binary_features.Rmd b/r/vignettes/developers/binary_features.Rmd new file mode 100644 index 0000000000..ed6c7180f5 --- /dev/null +++ b/r/vignettes/developers/binary_features.Rmd @@ -0,0 +1,193 @@ +--- +title: "Libarrow binary features" +description: > + Understanding which C++ features are enabled in Arrow R package builds +output: rmarkdown::html_vignette +--- + +This document explains which C++ features are enabled in different Arrow R +package build configurations, and documents the decisions behind our default +feature set. This is intended as internal developer documentation for understanding +which features are enabled in which builds. It is not intended to be a guide for +installing the Arrow R package; for that, see the +[installation guide](../../install.html). + +## Overview + +When the Arrow R package is installed, it needs a copy of the Arrow C++ library +(libarrow). This can come from: + +1. **Prebuilt binaries** we host (for releases and nightlies) +2. **Source builds** when binaries aren't available or users opt out + +The features available in libarrow depend on how it was built. This document +covers the feature configuration for both scenarios. + +## Prebuilt libarrow binary configuration + +We produce prebuilt libarrow binaries for macOS, Windows, and Linux. These +binaries include **more features** than the default source build to provide +users with a fully-featured experience out of the box. + +### Current binary feature set + +| Platform | S3 | GCS | Configured in | +|----------|----|----|---------------| +| macOS (ARM64, x86_64) | ON | ON | `dev/tasks/r/github.packages.yml` | +| Windows | ON | ON | `ci/scripts/PKGBUILD` | +| Linux (x86_64) | ON | ON | `compose.yaml` (`ubuntu-cpp-static`) | + +### Exceptions to our build defaults + +Even though GCS defaults to OFF for source builds, we explicitly enable it in +our prebuilt binaries because: + +1. **Binary users expect features to "just work"** - they shouldn't need to + rebuild from source to access cloud storage +2. **Build time is not a concern** - we build binaries once in CI, not on + user machines +3. **Parity across platforms** - users get the same features regardless of OS + + +## Feature configuration in source builds of libarrow + +Source builds are controlled by `r/inst/build_arrow_static.sh`. The key +environment variable is `LIBARROW_MINIMAL`: + +- `LIBARROW_MINIMAL` unset: Default feature set (Parquet, Dataset, JSON, common compression ON; S3/GCS/jemalloc OFF) +- `LIBARROW_MINIMAL=false`: Full feature set (adds S3, jemalloc, additional compression) +- `LIBARROW_MINIMAL=true`: Truly minimal (disables Parquet, Dataset, JSON, most compression, SIMD) + +### Features always enabled + +These features are always built regardless of `LIBARROW_MINIMAL`: + +| Feature | CMake Flag | Notes | +|---------|------------|-------| +| Compute | `ARROW_COMPUTE=ON` | Core compute functions | +| CSV | `ARROW_CSV=ON` | CSV reading/writing | +| Filesystem | `ARROW_FILESYSTEM=ON` | Local filesystem support | +| JSON | `ARROW_JSON=ON` | JSON reading | +| Parquet | `ARROW_PARQUET=ON` | Parquet file format | +| Dataset | `ARROW_DATASET=ON` | Multi-file datasets | +| Acero | `ARROW_ACERO=ON` | Query execution engine | +| Mimalloc | `ARROW_MIMALLOC=ON` | Memory allocator | +| LZ4 | `ARROW_WITH_LZ4=ON` | LZ4 compression | +| Snappy | `ARROW_WITH_SNAPPY=ON` | Snappy compression | +| RE2 | `ARROW_WITH_RE2=ON` | Regular expressions | +| UTF8Proc | `ARROW_WITH_UTF8PROC=ON` | Unicode support | + +### Features controlled by LIBARROW_MINIMAL + +When `LIBARROW_MINIMAL=false`, the following additional features are enabled +(via `$ARROW_DEFAULT_PARAM=ON`): + +| Feature | CMake Flag | Default | +|---------|------------|---------| +| S3 | `ARROW_S3` | `$ARROW_DEFAULT_PARAM` | +| Jemalloc | `ARROW_JEMALLOC` | `$ARROW_DEFAULT_PARAM` | +| Brotli | `ARROW_WITH_BROTLI` | `$ARROW_DEFAULT_PARAM` | +| BZ2 | `ARROW_WITH_BZ2` | `$ARROW_DEFAULT_PARAM` | +| Zlib | `ARROW_WITH_ZLIB` | `$ARROW_DEFAULT_PARAM` | +| Zstd | `ARROW_WITH_ZSTD` | `$ARROW_DEFAULT_PARAM` | + +### Features that require explicit opt-in + +GCS (Google Cloud Storage) is **always off by default**, even when +`LIBARROW_MINIMAL=false`: + +| Feature | CMake Flag | Default | Reason | +|---------|------------|---------|--------| +| GCS | `ARROW_GCS` | `OFF` | Build complexity, dependency size | + +To enable GCS in a source build, you must explicitly set `ARROW_GCS=ON`. + +**Why is GCS off by default?** + +GCS was turned off by default in [#48343](https://github.com/apache/arrow/pull/48343) +(December 2025) because: + +1. Building google-cloud-cpp is fragile and adds significant build time +2. The dependency on abseil (ABSL) has caused compatibility issues +3. Users who need GCS can still enable it explicitly + +## Configuration file locations + +### libarrow source build configuration + +The main build script that controls source builds: + +**`r/inst/build_arrow_static.sh`** - CMake flags and defaults +([view source](https://github.com/apache/arrow/blob/main/r/inst/build_arrow_static.sh)) +the environment variables to look for are `LIBARROW_MINIMAL`, `ARROW_*`, and, `ARROW_DEFAULT_PARAM` + +### libarrow binary build configuration + +Each platform has its own configuration file: + +| Platform | Config file | Key settings | +|----------|-------------|--------------| +| macOS | `dev/tasks/r/github.packages.yml` | `LIBARROW_MINIMAL=false`, `ARROW_GCS=ON` | +| Windows | `ci/scripts/PKGBUILD` | `ARROW_GCS=ON`, `ARROW_S3=ON` | +| Linux | `compose.yaml` (`ubuntu-cpp-static`) | `LIBARROW_MINIMAL=false`, `ARROW_GCS=ON` | + +## R-universe builds + +[R-universe](https://apache.r-universe.dev/arrow) builds the Arrow R package +for users who want newer versions than CRAN. R-universe behavior varies by +platform and architecture: + +| Platform | Architecture | Build method | Features | +|----------|--------------|--------------|----------| +| macOS | ARM64 | Downloads prebuilt binary | Full (S3 + GCS) | +| macOS | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) | +| Windows | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) | +| Windows | ARM64 | Not supported | NA | +| Linux | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) | +| Linux | ARM64 | Builds from source | S3 only (no GCS) | + +### Why Linux ARM64 builds from source + +We only publish prebuilt Linux binaries for x86_64 architecture. The binary +selection logic in `r/tools/nixlibs.R` (line 263) explicitly checks for this: + +```r +if (identical(os, "darwin") || (identical(os, "linux") && identical(arch, "x86_64"))) { +``` +When R-universe builds on Linux ARM64 runners, no binary is available, so it +falls back to building from source using `build_arrow_static.sh`. Since GCS +defaults to OFF in that script, Linux ARM64 users don't get GCS support. + +### Enabling GCS for Linux ARM64 + +To provide full feature parity for Linux ARM64, we would need to: + +1. Add an ARM64 Linux build job to `dev/tasks/r/github.packages.yml` +2. Update `select_binary()` in `nixlibs.R` to recognize `linux-aarch64` +3. Add the artifact pattern to `dev/tasks/tasks.yml` +4. Update the nightly upload workflow + +See [GH-36193](https://github.com/apache/arrow/issues/36193) for tracking this work. + +Alternatively, changing the GCS default in `build_arrow_static.sh` from `OFF` +to `$ARROW_DEFAULT_PARAM` would enable GCS for all source builds, including +Linux ARM64 on R-universe. + +## Checking installed features + +Users can check which features are enabled in their installation: + +```r +# Show all capabilities +arrow::arrow_info() + +# Check specific features +arrow::arrow_with_s3() +arrow::arrow_with_gcs() +``` + +## Related documentation + +- [Installation guide](../install.html) - User-facing installation docs +- [Installation details](./install_details.html) - How the build system works +- [Developer setup](./setup.html) - Building Arrow for development
