This is an automated email from the ASF dual-hosted git repository.

thisisnic pushed a commit to branch maint-23.0.1-r
in repository https://gitbox.apache.org/repos/asf/arrow.git

commit 8051caf7c4289728eb68eac140eb4a2840ed3e25
Author: Jonathan Keane <[email protected]>
AuthorDate: Tue Feb 3 16:20:45 2026 -0600

    GH-49067: [R] Disable GCS on macos (#49068)
    
    ### Rationale for this change
    Builds that complete on CRAN
    
    ### What changes are included in this PR?
    Disable GCS by default
    
    ### Are these changes tested?
    
    ### Are there any user-facing changes?
    Hopefully not
    
    **This PR includes breaking changes to public APIs.** (If there are any
    breaking changes to public APIs, please explain which changes are
    breaking. If not, you can remove this.)
    
    **This PR contains a "Critical Fix".** (If the changes fix either (a) a
    security vulnerability, (b) a bug that caused incorrect or invalid data
    to be produced, or (c) a bug that causes a crash (even when the API
    contract is upheld), please provide explanation. If not, you can remove
    this.)
    
    * GitHub Issue: #49067
    
    ---------
    
    Co-authored-by: Nic Crane <[email protected]>
---
 compose.yaml                               |   4 +-
 dev/tasks/r/github.packages.yml            |   1 -
 r/tools/nixlibs.R                          |   2 +-
 r/vignettes/developers/binary_features.Rmd | 193 +++++++++++++++++++++++++++++
 4 files changed, 197 insertions(+), 3 deletions(-)

diff --git a/compose.yaml b/compose.yaml
index 31bc5c81b9..8e908975df 100644
--- a/compose.yaml
+++ b/compose.yaml
@@ -441,7 +441,9 @@ services:
       ARROW_HOME: /arrow
       ARROW_DEPENDENCY_SOURCE: BUNDLED
       LIBARROW_MINIMAL: "false"
-      ARROW_MIMALLOC: "ON"
+      # explicitly enable GCS when we build libarrow so that binary libarrow
+      # users get more fully-featured builds
+      ARROW_GCS: "ON"
     volumes: *ubuntu-volumes
     command: &cpp-static-command
       /bin/bash -c "
diff --git a/dev/tasks/r/github.packages.yml b/dev/tasks/r/github.packages.yml
index cedb567f2c..40d3457292 100644
--- a/dev/tasks/r/github.packages.yml
+++ b/dev/tasks/r/github.packages.yml
@@ -81,7 +81,6 @@ jobs:
         env:
         {{ macros.github_set_sccache_envvars()|indent(8) }}
           MACOSX_DEPLOYMENT_TARGET: "11.6"
-          ARROW_S3: ON
           ARROW_GCS: ON
           ARROW_DEPENDENCY_SOURCE: BUNDLED
           CMAKE_GENERATOR: Ninja
diff --git a/r/tools/nixlibs.R b/r/tools/nixlibs.R
index f4ccb4956a..151dd47f5d 100644
--- a/r/tools/nixlibs.R
+++ b/r/tools/nixlibs.R
@@ -597,7 +597,7 @@ build_libarrow <- function(src_dir, dst_dir) {
     env_var_list <- c(
       env_var_list,
       ARROW_S3 = Sys.getenv("ARROW_S3", "ON"),
-      ARROW_GCS = Sys.getenv("ARROW_GCS", "ON"),
+      # ARROW_GCS = Sys.getenv("ARROW_GCS", "ON"),
       ARROW_WITH_ZSTD = Sys.getenv("ARROW_WITH_ZSTD", "ON")
     )
   }
diff --git a/r/vignettes/developers/binary_features.Rmd 
b/r/vignettes/developers/binary_features.Rmd
new file mode 100644
index 0000000000..ed6c7180f5
--- /dev/null
+++ b/r/vignettes/developers/binary_features.Rmd
@@ -0,0 +1,193 @@
+---
+title: "Libarrow binary features"
+description: >
+  Understanding which C++ features are enabled in Arrow R package builds
+output: rmarkdown::html_vignette
+---
+
+This document explains which C++ features are enabled in different Arrow R
+package build configurations, and documents the decisions behind our default
+feature set. This is intended as internal developer documentation for 
understanding
+which features are enabled in which builds. It is not intended to be a guide 
for
+installing the Arrow R package; for that, see the
+[installation guide](../../install.html).
+
+## Overview
+
+When the Arrow R package is installed, it needs a copy of the Arrow C++ library
+(libarrow). This can come from:
+
+1. **Prebuilt binaries** we host (for releases and nightlies)
+2. **Source builds** when binaries aren't available or users opt out
+
+The features available in libarrow depend on how it was built. This document
+covers the feature configuration for both scenarios.
+
+## Prebuilt libarrow binary configuration
+
+We produce prebuilt libarrow binaries for macOS, Windows, and Linux. These
+binaries include **more features** than the default source build to provide
+users with a fully-featured experience out of the box.
+
+### Current binary feature set
+
+| Platform | S3 | GCS | Configured in |
+|----------|----|----|---------------|
+| macOS (ARM64, x86_64) | ON | ON | `dev/tasks/r/github.packages.yml` |
+| Windows | ON | ON | `ci/scripts/PKGBUILD` |
+| Linux (x86_64) | ON | ON | `compose.yaml` (`ubuntu-cpp-static`) |
+
+### Exceptions to our build defaults
+
+Even though GCS defaults to OFF for source builds, we explicitly enable it in
+our prebuilt binaries because:
+
+1. **Binary users expect features to "just work"** - they shouldn't need to
+   rebuild from source to access cloud storage
+2. **Build time is not a concern** - we build binaries once in CI, not on
+   user machines
+3. **Parity across platforms** - users get the same features regardless of OS
+
+
+## Feature configuration in source builds of libarrow
+
+Source builds are controlled by `r/inst/build_arrow_static.sh`. The key
+environment variable is `LIBARROW_MINIMAL`:
+
+- `LIBARROW_MINIMAL` unset: Default feature set (Parquet, Dataset, JSON, 
common compression ON; S3/GCS/jemalloc OFF)
+- `LIBARROW_MINIMAL=false`: Full feature set (adds S3, jemalloc, additional 
compression)
+- `LIBARROW_MINIMAL=true`: Truly minimal (disables Parquet, Dataset, JSON, 
most compression, SIMD)
+
+### Features always enabled
+
+These features are always built regardless of `LIBARROW_MINIMAL`:
+
+| Feature | CMake Flag | Notes |
+|---------|------------|-------|
+| Compute | `ARROW_COMPUTE=ON` | Core compute functions |
+| CSV | `ARROW_CSV=ON` | CSV reading/writing |
+| Filesystem | `ARROW_FILESYSTEM=ON` | Local filesystem support |
+| JSON | `ARROW_JSON=ON` | JSON reading |
+| Parquet | `ARROW_PARQUET=ON` | Parquet file format |
+| Dataset | `ARROW_DATASET=ON` | Multi-file datasets |
+| Acero | `ARROW_ACERO=ON` | Query execution engine |
+| Mimalloc | `ARROW_MIMALLOC=ON` | Memory allocator |
+| LZ4 | `ARROW_WITH_LZ4=ON` | LZ4 compression |
+| Snappy | `ARROW_WITH_SNAPPY=ON` | Snappy compression |
+| RE2 | `ARROW_WITH_RE2=ON` | Regular expressions |
+| UTF8Proc | `ARROW_WITH_UTF8PROC=ON` | Unicode support |
+
+### Features controlled by LIBARROW_MINIMAL
+
+When `LIBARROW_MINIMAL=false`, the following additional features are enabled
+(via `$ARROW_DEFAULT_PARAM=ON`):
+
+| Feature | CMake Flag | Default |
+|---------|------------|---------|
+| S3 | `ARROW_S3` | `$ARROW_DEFAULT_PARAM` |
+| Jemalloc | `ARROW_JEMALLOC` | `$ARROW_DEFAULT_PARAM` |
+| Brotli | `ARROW_WITH_BROTLI` | `$ARROW_DEFAULT_PARAM` |
+| BZ2 | `ARROW_WITH_BZ2` | `$ARROW_DEFAULT_PARAM` |
+| Zlib | `ARROW_WITH_ZLIB` | `$ARROW_DEFAULT_PARAM` |
+| Zstd | `ARROW_WITH_ZSTD` | `$ARROW_DEFAULT_PARAM` |
+
+### Features that require explicit opt-in
+
+GCS (Google Cloud Storage) is **always off by default**, even when
+`LIBARROW_MINIMAL=false`:
+
+| Feature | CMake Flag | Default | Reason |
+|---------|------------|---------|--------|
+| GCS | `ARROW_GCS` | `OFF` | Build complexity, dependency size |
+
+To enable GCS in a source build, you must explicitly set `ARROW_GCS=ON`.
+
+**Why is GCS off by default?**
+
+GCS was turned off by default in 
[#48343](https://github.com/apache/arrow/pull/48343)
+(December 2025) because:
+
+1. Building google-cloud-cpp is fragile and adds significant build time
+2. The dependency on abseil (ABSL) has caused compatibility issues
+3. Users who need GCS can still enable it explicitly
+
+## Configuration file locations
+
+### libarrow source build configuration
+
+The main build script that controls source builds:
+
+**`r/inst/build_arrow_static.sh`** - CMake flags and defaults 
+([view 
source](https://github.com/apache/arrow/blob/main/r/inst/build_arrow_static.sh))
+the environment variables to look for are `LIBARROW_MINIMAL`, `ARROW_*`, and, 
`ARROW_DEFAULT_PARAM`
+
+### libarrow binary build configuration
+
+Each platform has its own configuration file:
+
+| Platform | Config file | Key settings |
+|----------|-------------|--------------|
+| macOS | `dev/tasks/r/github.packages.yml` | `LIBARROW_MINIMAL=false`, 
`ARROW_GCS=ON` |
+| Windows | `ci/scripts/PKGBUILD` | `ARROW_GCS=ON`, `ARROW_S3=ON` |
+| Linux | `compose.yaml` (`ubuntu-cpp-static`) | `LIBARROW_MINIMAL=false`, 
`ARROW_GCS=ON` |
+
+## R-universe builds
+
+[R-universe](https://apache.r-universe.dev/arrow) builds the Arrow R package
+for users who want newer versions than CRAN. R-universe behavior varies by
+platform and architecture:
+
+| Platform | Architecture | Build method | Features |
+|----------|--------------|--------------|----------|
+| macOS | ARM64 | Downloads prebuilt binary | Full (S3 + GCS) |
+| macOS | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) |
+| Windows | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) |
+| Windows | ARM64 | Not supported | NA |
+| Linux | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) |
+| Linux | ARM64 | Builds from source | S3 only (no GCS) |
+
+### Why Linux ARM64 builds from source
+
+We only publish prebuilt Linux binaries for x86_64 architecture. The binary
+selection logic in `r/tools/nixlibs.R` (line 263) explicitly checks for this:
+
+```r
+if (identical(os, "darwin") || (identical(os, "linux") && identical(arch, 
"x86_64"))) {
+```
+When R-universe builds on Linux ARM64 runners, no binary is available, so it
+falls back to building from source using `build_arrow_static.sh`. Since GCS
+defaults to OFF in that script, Linux ARM64 users don't get GCS support.
+
+### Enabling GCS for Linux ARM64
+
+To provide full feature parity for Linux ARM64, we would need to:
+
+1. Add an ARM64 Linux build job to `dev/tasks/r/github.packages.yml`
+2. Update `select_binary()` in `nixlibs.R` to recognize `linux-aarch64`
+3. Add the artifact pattern to `dev/tasks/tasks.yml`
+4. Update the nightly upload workflow
+
+See [GH-36193](https://github.com/apache/arrow/issues/36193) for tracking this 
work.
+
+Alternatively, changing the GCS default in `build_arrow_static.sh` from `OFF`
+to `$ARROW_DEFAULT_PARAM` would enable GCS for all source builds, including
+Linux ARM64 on R-universe.
+
+## Checking installed features
+
+Users can check which features are enabled in their installation:
+
+```r
+# Show all capabilities
+arrow::arrow_info()
+
+# Check specific features
+arrow::arrow_with_s3()
+arrow::arrow_with_gcs()
+```
+
+## Related documentation
+
+- [Installation guide](../install.html) - User-facing installation docs
+- [Installation details](./install_details.html) - How the build system works
+- [Developer setup](./setup.html) - Building Arrow for development

Reply via email to