karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698569604



##########
File path: r/tools/nixlibs.R
##########
@@ -82,7 +91,7 @@ download_binary <- function(os = identify_os()) {
 # * `TRUE` (not case-sensitive), to try to discover your current OS, or
 # * some other string, presumably a related "distro-version" that has binaries
 #   built that work for your OS
-identify_os <- function(os = Sys.getenv("LIBARROW_BINARY", 
Sys.getenv("LIBARROW_DOWNLOAD"))) {
+identify_os <- function(os = Sys.getenv("LIBARROW_BINARY", 
Sys.getenv("TEST_OFFLINE_BUILD"))) {

Review comment:
       Good catch. I think it should actually just look at `LIBARROW_BINARY`:
   
   ```r
   identify_os <- function(os = Sys.getenv("LIBARROW_BINARY")) {
     ...
   ```
   
   It's maybe worth noting that:
   * `identify_os` won't be called at all when `TEST_OFFLINE_BUILD` is `true` 
(but could be called if it was set to anything else)
   * At an earlier step, `configure` sets `LIBARROW_BINARY=true` if it was 
unset and `NOT_CRAN` is `true`
   
   
https://github.com/apache/arrow/blob/5a13cbf81ee66172b63341d20acf51efc03d0c97/r/tools/nixlibs.R#L581-L583
   
   
   

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,64 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access
+#' - Run this function
+#' - Copy the saved dependency files to a computer without internet access
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` 
that
+#'   points to the folder.
+#' - Install the `arrow` package on the computer without internet access
+#' - Run [arrow_info()] to check installed capabilities

Review comment:
       Sure! Subheading seem like a great idea. Something like this?
   
   ```r
   #' ## Steps for an offline install with optional dependencies:
   #'
   #' ### On a computer with internet access:
   #' - Install the `arrow` package
   #' - Run this function
   #' - Copy the saved dependency files to the computer with internet access
   #'
   #' ### On the computer without internet access:
   #' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` 
that
   #'   points to the newly copied folder of dependency files.
   #' - Install the `arrow` package
   #' - Run [arrow_info()] to check installed capabilities
   ```
   

##########
File path: r/R/util.R
##########
@@ -183,3 +183,63 @@ repeat_value_as_array <- function(object, n) {
   }
   return(Scalar$create(object)$as_array(n))
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#'
+#' @return TRUE/FALSE for whether the downloads were successful
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Other files are not changed.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access

Review comment:
       Sounds good!

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package

Review comment:
       Yes, unless you can think of a better way! As @jonkeane [pointed 
out](https://github.com/apache/arrow/pull/11001/#discussion_r698528943), it's 
possible to download those files from github, but protecting against version 
mismatch (between what's needed by `tools/cpp/` and what's listed in github's 
`versions.txt`) could be challenging.

##########
File path: r/tools/nixlibs.R
##########
@@ -320,33 +300,54 @@ build_libarrow <- function(src_dir, dst_dir) {
     BUILD_DIR = build_dir,
     DEST_DIR = dst_dir,
     CMAKE = cmake,
+    # EXTRA_CMAKE_FLAGS will often be "", but it's convenient later to have it 
defined

Review comment:
       Later in the code I have:
   
https://github.com/apache/arrow/blob/98b5601f94ff0f0caf240c6e1b914d4e8e49f98e/r/tools/nixlibs.R#L447-L452
   
   If we don't add `EXTRA_CMAKE_FLAGS` to the vector, that section could 
instead be
   ```r
       # The syntax to turn off XSIMD is different.
       # Pull existing value of EXTRA_CMAKE_FLAGS first (must be defined)
       "EXTRA_CMAKE_FLAGS" = paste(
         Sys.getenv("EXTRA_CMAKE_FLAGS"),
         "-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE"
       )
   ```
   
   I did it the first way to have the `EXTRA_CMAKE_FLAGS` collected at the same 
time as the other existing build flags, but if you think the second way is 
cleaner, I'm happy to change it.

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package
+#' - Run this function
+#' - Copy the saved dependency files to the computer with internet access
+#'
+#' ### On the computer without internet access:
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` 
that
+#'   points to the newly copied folder of dependency files.
+#' - Install the `arrow` package
+#' - Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or 
similar
+#' }
+#' @export
+download_optional_dependencies <- function(deps_dir = NULL) {
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh <- system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE
+  )
+  if (is.null(deps_dir) && Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR") != 
"") {
+    deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  }
+
+  dir.create(deps_dir, showWarnings = FALSE, recursive = TRUE)
+  # Run download_dependencies.sh
+  cat(paste0("*** Downloading optional dependencies to ", deps_dir, "\n"))
+  return_status <- system2(download_dependencies_sh,
+    args = deps_dir, stdout = FALSE, stderr = FALSE
+  )
+  if (isTRUE(return_status == 0)) {
+    cat(paste0(
+      "**** Set environment variable on offline machine and re-build arrow:\n",

Review comment:
       As I'm thinking about what to write, I feel like I'm just duplicating 
the help text. What about this message instead? (Or no message at all.)
   ```
   **** Download successful to <directory>
        See ?download_optional_dependencies for more details.
   ```

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package

Review comment:
       Yeah.... fair

##########
File path: r/vignettes/install.Rmd
##########
@@ -102,6 +102,14 @@ satisfy C++ dependencies.
 
 > Note that, unlike packages like `tensorflow`, `blogdown`, and others that 
 > require external dependencies, you do not need to run `install_arrow()` 
 > after a successful `arrow` installation.
 
+The `install-arrow.R` file also includes the `download_optional_dependencies()`
+function. Normally, when installing on a computer with internet access, the
+build process will download third-party dependencies as needed. This function
+provides a way to download them in advance. Relevant environment variables are
+`ARROW_THIRDPARTY_DEPENDENCY_DIR` for the directory of downloaded dependencies
+and `TEST_OFFLINE_BUILD` to force the build process not to download.

Review comment:
       Should I also remove it from the summary at the end of this vignette? It 
seems helpful to mention it somewhere, but I could also move the comment to the 
Developing vignette.

##########
File path: dev/tasks/tasks.yml
##########
@@ -1033,6 +1033,14 @@ tasks:
       flags: '-e ARROW_SOURCE_HOME="/arrow" -e FORCE_BUNDLED_BUILD=TRUE -e 
LIBARROW_BUILD=TRUE -e ARROW_DEPENDENCY_SOURCE=SYSTEM'
       image: ubuntu-r-only-r
 
+  test-r-offline-minimal:
+      ci: azure
+      template: r/azure.linux.yml
+      params:
+        r_org: rocker
+        r_image: r-base
+        r_tag: latest
+        flags: '-e TEST_OFFLINE_BUILD=true'

Review comment:
       I put this one on azure because the one above was on azure, but feel 
free to change to a different platform.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to