jonkeane commented on a change in pull request #10930:
URL: https://github.com/apache/arrow/pull/10930#discussion_r694277148
##########
File path: r/vignettes/developing.Rmd
##########
@@ -1,21 +1,23 @@
---
title: "Arrow R Developer Guide"
-output: rmarkdown::html_vignette
+output:
+ html_document:
+ toc: true
+ toc_depth: 2
vignette: >
%\VignetteIndexEntry{Arrow R Developer Guide}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
+
Review comment:
Super nit picky: is this new line needed?
##########
File path: r/vignettes/developing.Rmd
##########
@@ -60,36 +71,38 @@ brew install apache-arrow
brew install apache-arrow --HEAD
```
+### Windows and Linux
+
On Windows and Linux, you can download a .zip file with the arrow dependencies
from the
nightly repository.
-Windows users then can set the `RWINLIB_LOCAL` environment variable to point
to that
-zip file before installing the `arrow` R package. On Linux, you'll need to
create a `libarrow` directory inside the R package directory and unzip that
file into it. Version numbers in that
-repository correspond to dates, and you will likely want the most recent.
To see what nightlies are available, you can use Arrow's (or any other S3
client's) S3 listing functionality to see what is in the bucket
`s3://arrow-r-nightly/libarrow/bin`:
```
nightly <- s3_bucket("arrow-r-nightly")
nightly$ls("libarrow/bin")
```
+Version numbers in that repository correspond to dates.
-## Developer environment setup
+#### Windows
-If you need to alter both the Arrow C++ library and the R package code, or if
you can’t get a binary version of the latest C++ library elsewhere, you’ll need
to build it from source too. This section discusses how to set up a C++ build
configured to work with the R package. For more general resources, see the
[Arrow C++ developer
-guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+Windows users then can set the `RWINLIB_LOCAL` environment variable to point
to the zip file containing the arrow dependencies before installing the arrow R
package.
+
+#### Linux
+
+On Linux, you'll need to create a `libarrow` directory inside the R package
directory and unzip the zip file containing the arrow dependencies into it.
Review comment:
```suggestion
On Linux, you'll need to create a `libarrow` directory inside the R package
directory and unzip the zip file containing the compiled arrow binary files
into it.
```
##########
File path: r/vignettes/developing.Rmd
##########
@@ -101,37 +114,48 @@ brew install cmake openssl
sudo apt install -y cmake libcurl4-openssl-dev libssl-dev
```
-### Configure the Arrow build {.tabset}
+#### Windows
+
+Currently, the R package cannot be made to work with a locally-built Arrow C++
library. This will be resolved in a future release.
+
+### Step 2 - Configure the Arrow build
Review comment:
Same note about the `{.tabset}` here as well. If we do want to get rid
of them completely, that's totally fine, but if we do get rid of the tabs, I
think we should pick either user directory install or system install and only
document one of those. Having both of these sections run one after the other on
the the pkgdown site (which I assume is the place a large portion of people
reading this read it from) is more confusing than it is helpful. Hiding one or
the other with tabs makes it less distracting to document both cause one can
totally ignore one/the other.
##########
File path: r/vignettes/developing.Rmd
##########
@@ -202,11 +226,12 @@ To enable optional features including: S3 support, an
alternative memory allocat
Other flags that may be useful:
* `-DBoost_SOURCE=BUNDLED` and `-DThrift_SOURCE=bundled`, for example, or any
other dependency `*_SOURCE`, if you have a system version of a C++ dependency
that doesn't work correctly with Arrow. This tells the build to compile its own
version of the dependency from source.
Review comment:
```suggestion
* `-DBoost_SOURCE=BUNDLED` and `-DThrift_SOURCE=BUNDLED`, for example, or
any other dependency `*_SOURCE`, if you have a system version of a C++
dependency that doesn't work correctly with Arrow. This tells the build to
compile its own version of the dependency from source.
```
I believe these are both not case sensitive (though we should probably
check). This is something I noted when looking at this page a few days ago but
didn't have a change to come and fix it separately.
##########
File path: r/vignettes/install.Rmd
##########
@@ -123,6 +123,39 @@ you'll need to reinstall the package in order to enable S3
support.
# How dependencies are resolved
+There are a number of scripts that are triggered when `R CMD INSTALL .` is
run.
+For Arrow users, these should all just work without configuration and pull in
+the most complete pieces (e.g. official binaries that we host).
Review comment:
I wonder if it would be good to flag that most of this information is
really only intended to help developers and not day-to-day installers of Arrow?
I don't mind (and actually kind of like!) that it's in the install vignette as
opposed to the developing vignette, but most of the information is still
probably targets at (and most relevant to) developers and not standard
installers of Arrow.
There's also a reference at the end that says "(If you're looking at this
script, and you've gotten this far, it might look
incredibly familiar: it's basically the contents of this guide in script
form —
with a few important changes)" which we probably should link over to the
developing vignette there.
I don't think we want to take out some of the content just because it's not
relevant to installers (since it's useful info and we don't want to have two
different largely overlapping sections in both vignettes)
##########
File path: r/vignettes/developing.Rmd
##########
@@ -483,60 +434,156 @@ variables or other settings:
* All tests are skipped on Linux if the package builds without the C++
libarrow.
To make the build fail if libarrow is not available (as in, to test that
the C++ build was successful), set `TEST_R_WITH_ARROW=true`
+
* Some tests are disabled unless `ARROW_R_DEV=true`
+
* Tests that require allocating >2GB of memory to test Large types are disabled
unless `ARROW_LARGE_MEMORY_TESTS=true`
+
* Integration tests against a real S3 bucket are disabled unless credentials
are set in `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`; these are
available
on request
+
* S3 tests using [MinIO](https://min.io/) locally are enabled if the
`minio server` process is found running. If you're running MinIO with custom
settings, you can set `MINIO_ACCESS_KEY`, `MINIO_SECRET_KEY`, and
`MINIO_PORT` to override the defaults.
-## Github workflows
+## Running checks
+
+You can run package checks by using `devtools::check()` and check test
coverage with `covr::package_coverage()`.
+
+```r
+# All package checks
+devtools::check()
+
+# See test coverage statistics
+covr::report()
+covr::package_coverage()
+
+```
+
+For full package validation, you can run the following commands from a
terminal.
+
+```
+R CMD build .
+R CMD check arrow_*.tar.gz --as-cran
+```
+
+
+## Running additional CI checks
On a pull request, there are some actions you can trigger by commenting on the
PR. We have additional CI checks that run nightly and can be requested on
demand using an internal tool called
[crosssbow](https://arrow.apache.org/docs/developers/crossbow.html). A few
important GitHub comment commands include:
-* `@github-actions crossbow submit -g r` for all extended R CI tests
-* `@github-actions crossbow submit {task-name}` for running a specific task.
See the `r:` group definition near the beginning of the [crossbow
configuration](https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml)
for a list of glob expression patterns that match names of items in the
`tasks:` list below it.
-* `@github-actions autotune` will run and fix lint c++ linting errors + run R
documentation (among other cleanup tasks) and commit them to the branch
+#### Run all extended R CI tasks
+`@github-actions crossbow submit -g r`
+This runs each of the R-related CI tasks.
-## Useful functions for Arrow developers
+#### Run a specific task
+`@github-actions crossbow submit {task-name}`
-Within an R session, these can help with package development:
+See the `r:` group definition near the beginning of the [crossbow
configuration](https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml)
for a list of glob expression patterns that match names of items in the
`tasks:` list below it.
-``` r
-# Load the dev package
-devtools::load_all()
+#### Run linting and documentation building tasks
-# Run the test suite, optionally filtering file names
-devtools::test(filter="^regexp$")
-# or the Makefile alternative from the arrow/r directory in a shell:
-make test file=regexp
+`@github-actions autotune`
-# Update roxygen documentation
-devtools::document()
+This will run and fix lint C++ linting errors, run R documentation (among
other cleanup tasks), and commit the resulting updates to the branch.
-# To preview the documentation website
-pkgdown::build_site()
+# Troubleshooting
-# All package checks; see also below
-devtools::check()
+Note that after any change to the C++ library, you must reinstall it and
+run `make clean` or `git clean -fdx .` to remove any cached object code
+in the `r/src/` directory before reinstalling the R package. This is
+only necessary if you make changes to the C++ library source; you do not
+need to manually purge object files if you are only editing R or C++
+code inside `r/`.
-# See test coverage statistics
-covr::report()
-covr::package_coverage()
+## Arrow library-R package mismatches
+
+If the Arrow library and the R package have diverged, you will see errors like:
+
+```
+Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath
= DLLpath, ...):
+ unable to load shared object
'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so':
+
dlopen(/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so,
6): Symbol not found:
__ZN5arrow2io16RandomAccessFile9ReadAsyncERKNS0_9IOContextExx
+ Referenced from:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so
+ Expected in: flat namespace
+ in
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so
+Error: loading failed
+Execution halted
+ERROR: loading failed
```
-Any of those can be run from the command line by wrapping them in `R -e
-'$COMMAND'`. There’s also a `Makefile` to help with some common tasks
-from the command line (`make test`, `make doc`, `make clean`, etc.)
+To resolve this, try rebuilding the Arrow library from [Building Arrow
above](#step-3-building-arrow).
-### Full package validation
+## Multiple versions of Arrow library
-``` shell
-R CMD build .
-R CMD check arrow_*.tar.gz --as-cran
+If rebuilding the Arrow library doesn't work and you are [installing from a
user-level directory](#configure-for-installing-to-a-user-directory) and you
already have a previous installation of libarrow in a system directory or you
get you may get errors like the following when you install the R package:
Review comment:
```suggestion
If rebuilding the Arrow library doesn't work and you are [installing from a
user-level directory](#configure-for-installing-to-a-user-directory) and you
already have a previous installation of libarrow in a system directory or you
get errors like the following when you install the R package:
```
I'm not totally sure this is what you intended, but I think it is?
##########
File path: r/vignettes/developing.Rmd
##########
@@ -60,36 +71,38 @@ brew install apache-arrow
brew install apache-arrow --HEAD
```
+### Windows and Linux
+
On Windows and Linux, you can download a .zip file with the arrow dependencies
from the
nightly repository.
-Windows users then can set the `RWINLIB_LOCAL` environment variable to point
to that
-zip file before installing the `arrow` R package. On Linux, you'll need to
create a `libarrow` directory inside the R package directory and unzip that
file into it. Version numbers in that
-repository correspond to dates, and you will likely want the most recent.
To see what nightlies are available, you can use Arrow's (or any other S3
client's) S3 listing functionality to see what is in the bucket
`s3://arrow-r-nightly/libarrow/bin`:
```
nightly <- s3_bucket("arrow-r-nightly")
nightly$ls("libarrow/bin")
```
+Version numbers in that repository correspond to dates.
-## Developer environment setup
+#### Windows
-If you need to alter both the Arrow C++ library and the R package code, or if
you can’t get a binary version of the latest C++ library elsewhere, you’ll need
to build it from source too. This section discusses how to set up a C++ build
configured to work with the R package. For more general resources, see the
[Arrow C++ developer
-guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+Windows users then can set the `RWINLIB_LOCAL` environment variable to point
to the zip file containing the arrow dependencies before installing the arrow R
package.
+
+#### Linux
+
+On Linux, you'll need to create a `libarrow` directory inside the R package
directory and unzip the zip file containing the arrow dependencies into it.
+
+## R and C++
-There are four major steps to the process — the first three are relevant to
all Arrow developers, and the last one is specific to the R bindings:
+If you need to alter both the Arrow C++ library and the R package code, or if
you can't get a binary version of the latest C++ library elsewhere, you'll need
to build it from source. This section discusses how to set up a C++ build
configured to work with the R package. For more general resources, see the
[Arrow C++ developer
guide](https://arrow.apache.org/docs/developers/cpp/building.html).
-1. Configuring the Arrow library build (using `cmake`) — this specifies how
you want the build to go, what features to include, etc.
-2. Building the Arrow library — this actually compiles the Arrow library
-3. Install the Arrow library — this organizes and moves the compiled Arrow
library files into the location specified in the configuration
-4. Building the R package — this builds the C++ code in the R package, and
installs the R package for you
+There are five major steps to the process — the first four are relevant to all
Arrow developers, and the last one is specific to developers making changes to
the R package.
Review comment:
This is a very minor point, so we might want to ignore it totally /
gloss over it here, but technically "the first four are relevant to all Arrow
developers" isn't _quite_ true. There are (small) differences in building the
cpp code between the languages that use it (for example the flags recommended
[for python
development](https://arrow.apache.org/docs/developers/python.html#build-and-test)
are slightly different — we could in principle recommend that folks use a
union of the flags and then they would be able to do either, though compilation
would be longer, so I'm not sure we want to do that). Additionally, other
languages, like Rust, don't rely on the cpp code at all AFAIK, so for _those_
arrow developers this isn't super relevant.
All of that said, it might be best to leave it as is, or say something like
"are similar to steps that Arrow developers working on other languages use" and
leave it be. The details above in this comment are definitely *not* relevant
for someone building Arrow for R development for the first time
##########
File path: r/vignettes/developing.Rmd
##########
@@ -1,21 +1,23 @@
---
title: "Arrow R Developer Guide"
-output: rmarkdown::html_vignette
+output:
+ html_document:
+ toc: true
+ toc_depth: 2
Review comment:
I'm curious if there's a general movement away from
`rmarkdown::html_vignette`? I've always used `rmarkdown::html_vignette` because
I thought it was streamlined for inclusion in the package (and pkgdown
overrides this stuff anyway for the web docs).
##########
File path: r/vignettes/developing.Rmd
##########
@@ -483,60 +434,156 @@ variables or other settings:
* All tests are skipped on Linux if the package builds without the C++
libarrow.
To make the build fail if libarrow is not available (as in, to test that
the C++ build was successful), set `TEST_R_WITH_ARROW=true`
+
* Some tests are disabled unless `ARROW_R_DEV=true`
+
* Tests that require allocating >2GB of memory to test Large types are disabled
unless `ARROW_LARGE_MEMORY_TESTS=true`
+
* Integration tests against a real S3 bucket are disabled unless credentials
are set in `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`; these are
available
on request
+
* S3 tests using [MinIO](https://min.io/) locally are enabled if the
`minio server` process is found running. If you're running MinIO with custom
settings, you can set `MINIO_ACCESS_KEY`, `MINIO_SECRET_KEY`, and
`MINIO_PORT` to override the defaults.
-## Github workflows
+## Running checks
+
+You can run package checks by using `devtools::check()` and check test
coverage with `covr::package_coverage()`.
+
+```r
+# All package checks
+devtools::check()
+
+# See test coverage statistics
+covr::report()
+covr::package_coverage()
+
Review comment:
```suggestion
```
##########
File path: r/vignettes/developing.Rmd
##########
@@ -60,36 +71,38 @@ brew install apache-arrow
brew install apache-arrow --HEAD
```
+### Windows and Linux
+
On Windows and Linux, you can download a .zip file with the arrow dependencies
from the
nightly repository.
-Windows users then can set the `RWINLIB_LOCAL` environment variable to point
to that
-zip file before installing the `arrow` R package. On Linux, you'll need to
create a `libarrow` directory inside the R package directory and unzip that
file into it. Version numbers in that
-repository correspond to dates, and you will likely want the most recent.
To see what nightlies are available, you can use Arrow's (or any other S3
client's) S3 listing functionality to see what is in the bucket
`s3://arrow-r-nightly/libarrow/bin`:
```
nightly <- s3_bucket("arrow-r-nightly")
nightly$ls("libarrow/bin")
```
+Version numbers in that repository correspond to dates.
-## Developer environment setup
+#### Windows
-If you need to alter both the Arrow C++ library and the R package code, or if
you can’t get a binary version of the latest C++ library elsewhere, you’ll need
to build it from source too. This section discusses how to set up a C++ build
configured to work with the R package. For more general resources, see the
[Arrow C++ developer
-guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+Windows users then can set the `RWINLIB_LOCAL` environment variable to point
to the zip file containing the arrow dependencies before installing the arrow R
package.
+
+#### Linux
+
+On Linux, you'll need to create a `libarrow` directory inside the R package
directory and unzip the zip file containing the arrow dependencies into it.
+
+## R and C++
-There are four major steps to the process — the first three are relevant to
all Arrow developers, and the last one is specific to the R bindings:
+If you need to alter both the Arrow C++ library and the R package code, or if
you can't get a binary version of the latest C++ library elsewhere, you'll need
to build it from source. This section discusses how to set up a C++ build
configured to work with the R package. For more general resources, see the
[Arrow C++ developer
guide](https://arrow.apache.org/docs/developers/cpp/building.html).
-1. Configuring the Arrow library build (using `cmake`) — this specifies how
you want the build to go, what features to include, etc.
-2. Building the Arrow library — this actually compiles the Arrow library
-3. Install the Arrow library — this organizes and moves the compiled Arrow
library files into the location specified in the configuration
-4. Building the R package — this builds the C++ code in the R package, and
installs the R package for you
+There are five major steps to the process — the first four are relevant to all
Arrow developers, and the last one is specific to developers making changes to
the R package.
-### Install dependencies {.tabset}
+### Step 1 - Install dependencies
Review comment:
Did you intend to get rid of `{.tabset}` here? It's a bit funky how it
works (and is very much non-standard/hacked on to pkgdown), but it gives us a
nice tabbed interface for interacting with the setup steps (e.g.
https://arrow.apache.org/docs/r/articles/developing.html#install-dependencies).
##########
File path: r/vignettes/developing.Rmd
##########
@@ -483,60 +434,156 @@ variables or other settings:
* All tests are skipped on Linux if the package builds without the C++
libarrow.
To make the build fail if libarrow is not available (as in, to test that
the C++ build was successful), set `TEST_R_WITH_ARROW=true`
+
* Some tests are disabled unless `ARROW_R_DEV=true`
+
* Tests that require allocating >2GB of memory to test Large types are disabled
unless `ARROW_LARGE_MEMORY_TESTS=true`
+
* Integration tests against a real S3 bucket are disabled unless credentials
are set in `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`; these are
available
on request
+
* S3 tests using [MinIO](https://min.io/) locally are enabled if the
`minio server` process is found running. If you're running MinIO with custom
settings, you can set `MINIO_ACCESS_KEY`, `MINIO_SECRET_KEY`, and
`MINIO_PORT` to override the defaults.
-## Github workflows
+## Running checks
+
+You can run package checks by using `devtools::check()` and check test
coverage with `covr::package_coverage()`.
+
+```r
+# All package checks
+devtools::check()
+
+# See test coverage statistics
+covr::report()
+covr::package_coverage()
+
+```
+
+For full package validation, you can run the following commands from a
terminal.
+
+```
+R CMD build .
+R CMD check arrow_*.tar.gz --as-cran
+```
+
+
+## Running additional CI checks
On a pull request, there are some actions you can trigger by commenting on the
PR. We have additional CI checks that run nightly and can be requested on
demand using an internal tool called
[crosssbow](https://arrow.apache.org/docs/developers/crossbow.html). A few
important GitHub comment commands include:
-* `@github-actions crossbow submit -g r` for all extended R CI tests
-* `@github-actions crossbow submit {task-name}` for running a specific task.
See the `r:` group definition near the beginning of the [crossbow
configuration](https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml)
for a list of glob expression patterns that match names of items in the
`tasks:` list below it.
-* `@github-actions autotune` will run and fix lint c++ linting errors + run R
documentation (among other cleanup tasks) and commit them to the branch
+#### Run all extended R CI tasks
+`@github-actions crossbow submit -g r`
+This runs each of the R-related CI tasks.
-## Useful functions for Arrow developers
+#### Run a specific task
+`@github-actions crossbow submit {task-name}`
-Within an R session, these can help with package development:
+See the `r:` group definition near the beginning of the [crossbow
configuration](https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml)
for a list of glob expression patterns that match names of items in the
`tasks:` list below it.
-``` r
-# Load the dev package
-devtools::load_all()
+#### Run linting and documentation building tasks
-# Run the test suite, optionally filtering file names
-devtools::test(filter="^regexp$")
-# or the Makefile alternative from the arrow/r directory in a shell:
-make test file=regexp
+`@github-actions autotune`
-# Update roxygen documentation
-devtools::document()
+This will run and fix lint C++ linting errors, run R documentation (among
other cleanup tasks), and commit the resulting updates to the branch.
Review comment:
```suggestion
This will run and fix lint C++ linting errors, run R documentation (among
other cleanup tasks), run styler on any changed R code, and commit the
resulting updates to the branch.
```
##########
File path: r/vignettes/developing.Rmd
##########
@@ -60,36 +71,38 @@ brew install apache-arrow
brew install apache-arrow --HEAD
```
+### Windows and Linux
+
On Windows and Linux, you can download a .zip file with the arrow dependencies
from the
nightly repository.
-Windows users then can set the `RWINLIB_LOCAL` environment variable to point
to that
-zip file before installing the `arrow` R package. On Linux, you'll need to
create a `libarrow` directory inside the R package directory and unzip that
file into it. Version numbers in that
-repository correspond to dates, and you will likely want the most recent.
To see what nightlies are available, you can use Arrow's (or any other S3
client's) S3 listing functionality to see what is in the bucket
`s3://arrow-r-nightly/libarrow/bin`:
```
nightly <- s3_bucket("arrow-r-nightly")
nightly$ls("libarrow/bin")
```
+Version numbers in that repository correspond to dates.
-## Developer environment setup
+#### Windows
-If you need to alter both the Arrow C++ library and the R package code, or if
you can’t get a binary version of the latest C++ library elsewhere, you’ll need
to build it from source too. This section discusses how to set up a C++ build
configured to work with the R package. For more general resources, see the
[Arrow C++ developer
-guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+Windows users then can set the `RWINLIB_LOCAL` environment variable to point
to the zip file containing the arrow dependencies before installing the arrow R
package.
+
+#### Linux
+
+On Linux, you'll need to create a `libarrow` directory inside the R package
directory and unzip the zip file containing the arrow dependencies into it.
Review comment:
It does include some dependencies (and technically libarrow *is* a
dependency of the R pacakge), but I think this is a clearer description of what
we are doing here.
##########
File path: r/vignettes/developing.Rmd
##########
@@ -483,60 +434,156 @@ variables or other settings:
* All tests are skipped on Linux if the package builds without the C++
libarrow.
To make the build fail if libarrow is not available (as in, to test that
the C++ build was successful), set `TEST_R_WITH_ARROW=true`
+
* Some tests are disabled unless `ARROW_R_DEV=true`
+
* Tests that require allocating >2GB of memory to test Large types are disabled
unless `ARROW_LARGE_MEMORY_TESTS=true`
+
* Integration tests against a real S3 bucket are disabled unless credentials
are set in `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`; these are
available
on request
+
* S3 tests using [MinIO](https://min.io/) locally are enabled if the
`minio server` process is found running. If you're running MinIO with custom
settings, you can set `MINIO_ACCESS_KEY`, `MINIO_SECRET_KEY`, and
`MINIO_PORT` to override the defaults.
-## Github workflows
+## Running checks
+
+You can run package checks by using `devtools::check()` and check test
coverage with `covr::package_coverage()`.
+
+```r
+# All package checks
+devtools::check()
+
+# See test coverage statistics
+covr::report()
+covr::package_coverage()
+
+```
+
+For full package validation, you can run the following commands from a
terminal.
+
+```
+R CMD build .
+R CMD check arrow_*.tar.gz --as-cran
+```
+
+
+## Running additional CI checks
On a pull request, there are some actions you can trigger by commenting on the
PR. We have additional CI checks that run nightly and can be requested on
demand using an internal tool called
[crosssbow](https://arrow.apache.org/docs/developers/crossbow.html). A few
important GitHub comment commands include:
-* `@github-actions crossbow submit -g r` for all extended R CI tests
-* `@github-actions crossbow submit {task-name}` for running a specific task.
See the `r:` group definition near the beginning of the [crossbow
configuration](https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml)
for a list of glob expression patterns that match names of items in the
`tasks:` list below it.
-* `@github-actions autotune` will run and fix lint c++ linting errors + run R
documentation (among other cleanup tasks) and commit them to the branch
+#### Run all extended R CI tasks
+`@github-actions crossbow submit -g r`
+This runs each of the R-related CI tasks.
-## Useful functions for Arrow developers
+#### Run a specific task
+`@github-actions crossbow submit {task-name}`
-Within an R session, these can help with package development:
+See the `r:` group definition near the beginning of the [crossbow
configuration](https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml)
for a list of glob expression patterns that match names of items in the
`tasks:` list below it.
-``` r
-# Load the dev package
-devtools::load_all()
+#### Run linting and documentation building tasks
-# Run the test suite, optionally filtering file names
-devtools::test(filter="^regexp$")
-# or the Makefile alternative from the arrow/r directory in a shell:
-make test file=regexp
+`@github-actions autotune`
-# Update roxygen documentation
-devtools::document()
+This will run and fix lint C++ linting errors, run R documentation (among
other cleanup tasks), and commit the resulting updates to the branch.
-# To preview the documentation website
-pkgdown::build_site()
+# Troubleshooting
-# All package checks; see also below
-devtools::check()
+Note that after any change to the C++ library, you must reinstall it and
+run `make clean` or `git clean -fdx .` to remove any cached object code
+in the `r/src/` directory before reinstalling the R package. This is
+only necessary if you make changes to the C++ library source; you do not
+need to manually purge object files if you are only editing R or C++
+code inside `r/`.
-# See test coverage statistics
-covr::report()
-covr::package_coverage()
+## Arrow library-R package mismatches
+
+If the Arrow library and the R package have diverged, you will see errors like:
+
+```
+Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath
= DLLpath, ...):
+ unable to load shared object
'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so':
+
dlopen(/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so,
6): Symbol not found:
__ZN5arrow2io16RandomAccessFile9ReadAsyncERKNS0_9IOContextExx
+ Referenced from:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so
+ Expected in: flat namespace
+ in
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so
+Error: loading failed
+Execution halted
+ERROR: loading failed
```
-Any of those can be run from the command line by wrapping them in `R -e
-'$COMMAND'`. There’s also a `Makefile` to help with some common tasks
-from the command line (`make test`, `make doc`, `make clean`, etc.)
+To resolve this, try rebuilding the Arrow library from [Building Arrow
above](#step-3-building-arrow).
-### Full package validation
+## Multiple versions of Arrow library
-``` shell
-R CMD build .
-R CMD check arrow_*.tar.gz --as-cran
+If rebuilding the Arrow library doesn't work and you are [installing from a
user-level directory](#configure-for-installing-to-a-user-directory) and you
already have a previous installation of libarrow in a system directory or you
get you may get errors like the following when you install the R package:
+
+```
+Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath
= DLLpath, ...):
+ unable to load shared object
'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so':
+
dlopen(/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so,
6): Library not loaded: /usr/local/lib/libarrow.400.dylib
+ Referenced from: /usr/local/lib/libparquet.400.dylib
+ Reason: image not found
```
+
+You need to make sure that you don't let R link to your system library when
building arrow. You can do this a number of different ways:
+
+* Setting the `MAKEFLAGS` environment variable to `"LDFLAGS="` (see below for
an example) this is the recommended way to accomplish this
+* Using {withr}'s `with_makevars(list(LDFLAGS = ""), ...)`
+* adding `LDFLAGS=` to your `~/.R/Makevars` file (the least recommended way,
though it is a common debugging approach suggested online)
+
+```{bash, save=run & !sys_install & macos, hide=TRUE}
+# Setup troubleshooting section
+# install a system-level arrow on macOS
+brew install apache-arrow
+```
+
+
+```{bash, save=run & !sys_install & ubuntu, hide=TRUE}
+# Setup troubleshooting section
+# install a system-level arrow on Ubuntu
+sudo apt update
+sudo apt install -y -V ca-certificates lsb-release wget
+wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr
'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename
--short).deb
+sudo apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release
--codename --short).deb
+sudo apt update
+sudo apt install -y -V libarrow-dev
+```
+
+```{bash, save=run & !sys_install & macos}
+MAKEFLAGS="LDFLAGS=" R CMD INSTALL .
+```
+
+
+## `rpath` issues
+
+If the package fails to install/load with an error like this:
+
+```
+ ** testing if installed package can be loaded from temporary location
+ Error: package or namespace load failed for 'arrow' in dyn.load(file,
DLLpath = DLLpath, ...):
+ unable to load shared object
'/Users/you/R/00LOCK-r/00new/arrow/libs/arrow.so':
+ dlopen(/Users/you/R/00LOCK-r/00new/arrow/libs/arrow.so, 6): Library not
loaded: @rpath/libarrow.14.dylib
+```
+
+ensure that `-DARROW_INSTALL_NAME_RPATH=OFF` was passed (this is important on
+macOS to prevent problems at link time and is a no-op on other platforms).
+Alternatively, try setting the environment variable `R_LD_LIBRARY_PATH` to
+wherever Arrow C++ was put in `make install`, e.g. `export
+R_LD_LIBRARY_PATH=/usr/local/lib`, and retry installing the R package.
+
+When installing from source, if the R and C++ library versions do not
+match, installation may fail. If you've previously installed the
+libraries and want to upgrade the R package, you'll need to update the
+Arrow C++ library first.
+
+For any other build/configuration challenges, see the [C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+## Other installation issues
+
+There are a number of scripts that are triggered when the arrow R package is
installed. For package users who are not interacting with the underlying code,
these should all just work without configuration and pull in the most complete
pieces (e.g. official binaries that we host). However, knowing about these
scripts can help package developers troubleshoot if things go wrong in them or
things go wrong in an install. See [the installation vignette](./install.html)
for more information.
Review comment:
Could/should we link to the `How dependencies are resolved` section in
the install vignette here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]