thisisnic commented on code in PR #49068:
URL: https://github.com/apache/arrow/pull/49068#discussion_r2759816031


##########
r/vignettes/developers/binary_features.Rmd:
##########
@@ -0,0 +1,193 @@
+---
+title: "Libarrow binary features"
+description: >
+  Understanding which C++ features are enabled in Arrow R package builds
+output: rmarkdown::html_vignette
+---
+
+This document explains which C++ features are enabled in different Arrow R
+package build configurations, and documents the decisions behind our default
+feature set. This is intended as internal developer documentation for 
understanding
+which features are enabled in which builds. It is not intended to be a guide 
for
+installing the Arrow R package; for that, see the
+[installation guide](../../install.html).
+
+## Overview
+
+When the Arrow R package is installed, it needs a copy of the Arrow C++ library
+(libarrow). This can come from:
+
+1. **Prebuilt binaries** we host (for releases and nightlies)
+2. **Source builds** when binaries aren't available or users opt out
+
+The features available in libarrow depend on how it was built. This document
+covers the feature configuration for both scenarios.
+
+## Prebuilt libarrow binary configuration
+
+We produce prebuilt libarrow binaries for macOS, Windows, and Linux. These
+binaries include **more features** than the default source build to provide
+users with a fully-featured experience out of the box.
+
+### Current binary feature set
+
+| Platform | S3 | GCS | Configured in |
+|----------|----|----|---------------|
+| macOS (ARM64, x86_64) | ON | ON | `dev/tasks/r/github.packages.yml` |
+| Windows | ON | ON | `ci/scripts/PKGBUILD` |
+| Linux (x86_64) | ON | ON | `compose.yaml` (`ubuntu-cpp-static`) |
+
+### Exceptions to our build defaults
+
+Even though GCS defaults to OFF for source builds, we explicitly enable it in
+our prebuilt binaries because:
+
+1. **Binary users expect features to "just work"** - they shouldn't need to
+   rebuild from source to access cloud storage
+2. **Build time is not a concern** - we build binaries once in CI, not on
+   user machines
+3. **Parity across platforms** - users get the same features regardless of OS
+
+
+## Feature configuration in source builds of libarrow
+
+Source builds are controlled by `r/inst/build_arrow_static.sh`. The key
+environment variable is `LIBARROW_MINIMAL`:
+
+- `LIBARROW_MINIMAL` unset: Default feature set (Parquet, Dataset, JSON, 
common compression ON; S3/GCS/jemalloc OFF)
+- `LIBARROW_MINIMAL=false`: Full feature set (adds S3, jemalloc, additional 
compression)
+- `LIBARROW_MINIMAL=true`: Truly minimal (disables Parquet, Dataset, JSON, 
most compression, SIMD)
+
+### Features always enabled
+
+These features are always built regardless of `LIBARROW_MINIMAL`:
+
+| Feature | CMake Flag | Notes |
+|---------|------------|-------|
+| Compute | `ARROW_COMPUTE=ON` | Core compute functions |
+| CSV | `ARROW_CSV=ON` | CSV reading/writing |
+| Filesystem | `ARROW_FILESYSTEM=ON` | Local filesystem support |
+| JSON | `ARROW_JSON=ON` | JSON reading |
+| Parquet | `ARROW_PARQUET=ON` | Parquet file format |
+| Dataset | `ARROW_DATASET=ON` | Multi-file datasets |
+| Acero | `ARROW_ACERO=ON` | Query execution engine |
+| Mimalloc | `ARROW_MIMALLOC=ON` | Memory allocator |
+| LZ4 | `ARROW_WITH_LZ4=ON` | LZ4 compression |
+| Snappy | `ARROW_WITH_SNAPPY=ON` | Snappy compression |
+| RE2 | `ARROW_WITH_RE2=ON` | Regular expressions |
+| UTF8Proc | `ARROW_WITH_UTF8PROC=ON` | Unicode support |
+
+### Features controlled by LIBARROW_MINIMAL

Review Comment:
   These are the docs I wish we had a few years ago, thank you for writing 
them!!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to