thisisnic commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613186933



##########
File path: r/vignettes/developing.Rmd
##########
@@ -0,0 +1,465 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Developer Documentation}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+    cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+    return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows, you can download a .zip file with the arrow dependencies from the
+[nightly 
repository](https://arrow-r-nightly.s3.amazonaws.com/libarrow/bin/windows/),
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ library will by default use system dependencies if suitable 
versions are found; if they are not present, it will build them during its own 
build process. The only dependencies that one needs to install outside of the 
build process are `cmake` (for configuring the build) and `openssl` if you are 
building with S3 support.
+
+For a faster build, you may choose to install on the system more C++ library 
dependencies (such as `lz4`, `zstd`, etc.) so that they don't need to be built 
from source in the Arrow build. This is optional.
+
+#### macOS
+```{bash, save=run & macos}
+brew install cmake openssl
+```
+
+#### Ubuntu
+```{bash, save=run & ubuntu}
+sudo apt install -y cmake libcurl4-openssl-dev libssl-dev
+```
+
+### Configure the Arrow build {.tabset}
+
+You can choose to build and then install the Arrow library into a user-defined 
directory or into a system-level directory. You only need to do one of these 
two options.
+
+Either way, you will need to create a directory into which the C++ build will 
put its contents. It is recommended to make a `build` directory inside of the 
`cpp` directory of the Arrow git repository (it is git-ignored, so you won't 
accidentally check it in).
+
+Starting from your git checkout of `apache/arrow`,
+
+```{bash, save=run & !sys_install}
+mkdir -p cpp/build
+```
+
+#### Install to another directory
+
+It is recommended that you install the arrow library to a user-level directory 
to be used in development. In this example we will install it to a directory 
called `dist` that has the same parent as our `arrow` checkout, but it could be 
named or located anywhere you would like. However, note that your installation 
of the Arrow R package will point to this directory and need it to remain 
intact for the package to continue to work. This is one reason we recommend 
*not* placing it inside of the arrow git checkout.
+
+```{bash, save=run & !sys_install}
+export ARROW_HOME=$(pwd)/dist
+mkdir $ARROW_HOME
+```
+
+
+_Special instructions on Linux:_ You will need to set `LD_LIBRARY_PATH` to the 
same directory as `LIB_DIR` before launching R and using Arrow. One way to do 
this is to add it to your profile (we use `~/.bash_profile` here, but you might 
need to put this in a different file depending on your setup). On macOS we do 
not need to do this because the macOS shared library paths are hardcoded to 
their locations during build time.
+
+```{bash, save=run & ubuntu & !sys_install}
+export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH
+echo "export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH" >> 
~/.bash_profile
+```
+
+To build, change directories to be inside `arrow/cpp/build`:
+
+```{bash, save=run & !sys_install}
+pushd cpp/build
+```
+
+You’ll first call `cmake` to configure the build and then `make install`. For 
the R package, you’ll need to enable several features in the C++ library using 
`-D` flags:
+
+```{bash, save=run & !sys_install}
+cmake \
+  -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
+  -DCMAKE_INSTALL_LIBDIR=lib \
+  -DARROW_COMPUTE=ON \
+  -DARROW_CSV=ON \
+  -DARROW_DATASET=ON \
+  -DARROW_FILESYSTEM=ON \
+  -DARROW_JEMALLOC=ON \
+  -DARROW_JSON=ON \
+  -DARROW_PARQUET=ON \
+  -DARROW_WITH_SNAPPY=ON \
+  -DARROW_WITH_ZLIB=ON \
+  -DARROW_INSTALL_NAME_RPATH=OFF \
+  ..
+```
+
+`..` refers to the C++ source directory: we're in `cpp/build`, and the source 
is in `cpp`.
+
+#### Install to the system
+
+If you would like to install Arrow as a system library you can do that as 
well. This is in some respects simpler, but if you already have Arrow libraries 
installed there, it would disrupt them and possibly require `sudo` permissions.
+
+To build, change directories to be inside `arrow/cpp/build`:
+
+```{bash, save=run & !sys_install}
+pushd cpp/build
+```
+
+You’ll first call `cmake` to configure the build and then `make install`. For 
the R package, you’ll need to enable several features in the C++ library using 
`-D` flags:
+
+```{bash, save=run & sys_install}
+cmake \
+  -DARROW_COMPUTE=ON \
+  -DARROW_CSV=ON \
+  -DARROW_DATASET=ON \
+  -DARROW_FILESYSTEM=ON \
+  -DARROW_JEMALLOC=ON \
+  -DARROW_JSON=ON \
+  -DARROW_PARQUET=ON \
+  -DARROW_WITH_SNAPPY=ON \
+  -DARROW_WITH_ZLIB=ON \
+  -DARROW_INSTALL_NAME_RPATH=OFF \
+  ..
+```
+
+### More Arrow features
+
+To enable optional features including: S3 support, an alternative memory 
allocator, and additional compression libraries, add some or all of these flags:
+
+``` shell
+  -DARROW_MIMALLOC=ON \
+  -DARROW_WITH_BROTLI=ON \
+  -DARROW_WITH_BZ2=ON \
+  -DARROW_WITH_LZ4=ON \
+  -DARROW_WITH_SNAPPY=ON \
+  -DARROW_WITH_ZLIB=ON \
+  -DARROW_WITH_ZSTD=ON \
+```
+
+Other flags that may be useful:
+
+* `-DARROW_EXTRA_ERROR_CONTEXT=ON` makes errors coming from the C++ library 
point to files and line numbers
+* `-DBoost_SOURCE=BUNDLED` and `-DThrift_SOURCE=bundled`, for example, or any 
other dependency `*_SOURCE`, if you have a system version of a C++ dependency 
that doesn't work correctly with Arrow. This tells the build to compile its own 
version of the dependency from source.
+* `-DCMAKE_BUILD_TYPE=debug` and `-DCMAKE_BUILD_TYPE=relwithdebinfo` can be 
useful for debugging, though they are both slower to compile than the default 
`release`.
+
+
+### Build Arrow
+
+You can `-j#` here too to speed up compilation by running in parallel (where 
`#` is the number of cores you have available).

Review comment:
       Tiny point, but took me a couple of reads to realise that this sentence 
means you can add `-j#` to the command below.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to