nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698674825



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package

Review comment:
       ```suggestion
   #' - If you don't already have the `arrow` package installed, get this 
function by
   #' 
`source("https://raw.githubusercontent.com/apache/arrow/master/r/R/install-arrow.R";)`
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -373,7 +374,15 @@ ensure_cmake <- function() {
     )
     cmake_tar <- tempfile()
     cmake_dir <- tempfile()
-    try_download(cmake_binary_url, cmake_tar)
+    download_successful <- try_download(cmake_binary_url, cmake_tar)
+    if (!download_successful) {
+      cat(paste0(
+        "*** cmake was not found locally and download failed.\n",
+        "    Make sure cmake is installed and available on your PATH\n",
+        "    (or download '", cmake_binary_url,
+        "' and define the CMAKE environment variable).\n"
+      ))

Review comment:
       ```suggestion
         cat(paste0(
           "*** cmake was not found locally and download failed.\n",
           "    Make sure cmake >= 3.10 is installed and available on your 
PATH,\n",
           "    or download ", cmake_binary_url, "\n",
           "    and define the CMAKE environment variable.\n"
         ))
   ```

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package
+#' - Run this function
+#' - Copy the saved dependency files to the computer with internet access
+#'
+#' ### On the computer without internet access:
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` 
that
+#'   points to the newly copied folder of dependency files.
+#' - Install the `arrow` package
+#' - Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or 
similar
+#' }
+#' @export
+download_optional_dependencies <- function(deps_dir = NULL) {
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh <- system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE
+  )
+  if (is.null(deps_dir) && Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR") != 
"") {
+    deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  }
+
+  dir.create(deps_dir, showWarnings = FALSE, recursive = TRUE)
+  # Run download_dependencies.sh
+  cat(paste0("*** Downloading optional dependencies to ", deps_dir, "\n"))
+  return_status <- system2(download_dependencies_sh,
+    args = deps_dir, stdout = FALSE, stderr = FALSE
+  )
+  if (isTRUE(return_status == 0)) {
+    cat(paste0(
+      "**** Set environment variable on offline machine and re-build arrow:\n",

Review comment:
       Should this message also tell you to copy the directory to the other 
machine?

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package
+#' - Run this function
+#' - Copy the saved dependency files to the computer with internet access
+#'
+#' ### On the computer without internet access:
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` 
that
+#'   points to the newly copied folder of dependency files.
+#' - Install the `arrow` package
+#' - Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or 
similar
+#' }
+#' @export
+download_optional_dependencies <- function(deps_dir = NULL) {
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh <- system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE
+  )
+  if (is.null(deps_dir) && Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR") != 
"") {
+    deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  }

Review comment:
       ```suggestion
   download_optional_dependencies <- function(deps_dir = 
Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")) {
     # This script is copied over from arrow/cpp/... to arrow/r/inst/...
     download_dependencies_sh <- system.file(
       "thirdparty/download_dependencies.sh",
       package = "arrow",
       mustWork = TRUE
     )
   ```

##########
File path: r/vignettes/install.Rmd
##########
@@ -304,10 +316,12 @@ By default, these are all unset. All boolean variables 
are case-insensitive.
   won't look for Arrow libraries on your system and instead will look to 
download/build them.
   Use this if you have a version mismatch between installed system libraries
   and the version of the R package you're installing.
-* `LIBARROW_DOWNLOAD`: Unless set to `false`, the build script
-  will attempt to download C++ binary or source bundles.
+* `TEST_OFFLINE_BUILD`: Unless set to `true`, the build script
+  will download prebuilt C++ binary or third-party source bundles as necessary.
   If you're in a checkout of the `apache/arrow` git repository

Review comment:
       ```suggestion
   ```

##########
File path: r/vignettes/install.Rmd
##########
@@ -102,6 +102,14 @@ satisfy C++ dependencies.
 
 > Note that, unlike packages like `tensorflow`, `blogdown`, and others that 
 > require external dependencies, you do not need to run `install_arrow()` 
 > after a successful `arrow` installation.
 
+The `install-arrow.R` file also includes the `download_optional_dependencies()`
+function. Normally, when installing on a computer with internet access, the
+build process will download third-party dependencies as needed. This function
+provides a way to download them in advance. Relevant environment variables are
+`ARROW_THIRDPARTY_DEPENDENCY_DIR` for the directory of downloaded dependencies
+and `TEST_OFFLINE_BUILD` to force the build process not to download.

Review comment:
       I don't think we should document this in this vignette--users should not 
worry with this env var, it's for us for testing

##########
File path: r/vignettes/install.Rmd
##########
@@ -304,10 +316,12 @@ By default, these are all unset. All boolean variables 
are case-insensitive.
   won't look for Arrow libraries on your system and instead will look to 
download/build them.
   Use this if you have a version mismatch between installed system libraries
   and the version of the R package you're installing.
-* `LIBARROW_DOWNLOAD`: Unless set to `false`, the build script
-  will attempt to download C++ binary or source bundles.
+* `TEST_OFFLINE_BUILD`: Unless set to `true`, the build script
+  will download prebuilt C++ binary or third-party source bundles as necessary.
   If you're in a checkout of the `apache/arrow` git repository
-  and want to build the C++ library from the local source, make this `false`.
+  and want to build the C++ library from the local source, make this `false` or
+  not set. If building the C++ library from source with cmake unavailable, 
cmake

Review comment:
       ```suggestion
     If building the C++ library from source with cmake unavailable, cmake
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -29,17 +29,8 @@ if (getRversion() < 3.4 && 
is.null(getOption("download.file.method"))) {
 options(.arrow.cleanup = character()) # To collect dirs to rm on exit
 on.exit(unlink(getOption(".arrow.cleanup")))
 
+

Review comment:
       ```suggestion
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -320,33 +300,54 @@ build_libarrow <- function(src_dir, dst_dir) {
     BUILD_DIR = build_dir,
     DEST_DIR = dst_dir,
     CMAKE = cmake,
+    # EXTRA_CMAKE_FLAGS will often be "", but it's convenient later to have it 
defined

Review comment:
       Why?

##########
File path: r/vignettes/install.Rmd
##########
@@ -102,6 +102,14 @@ satisfy C++ dependencies.
 
 > Note that, unlike packages like `tensorflow`, `blogdown`, and others that 
 > require external dependencies, you do not need to run `install_arrow()` 
 > after a successful `arrow` installation.
 
+The `install-arrow.R` file also includes the `download_optional_dependencies()`
+function. Normally, when installing on a computer with internet access, the
+build process will download third-party dependencies as needed. This function
+provides a way to download them in advance. Relevant environment variables are

Review comment:
       These sentences should probably mention the offline/airgapped server use 
case and how you'd use it. 

##########
File path: r/tools/nixlibs.R
##########
@@ -413,66 +422,144 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
-with_s3_support <- function(env_vars) {
-  arrow_s3 <- toupper(Sys.getenv("ARROW_S3")) == "ON" || 
tolower(Sys.getenv("LIBARROW_MINIMAL")) == "false"
+turn_off_thirdparty_features <- function(env_var_list) {
+  # Because these are done as environment variables (as opposed to build 
flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC" = "OFF",
+    "ARROW_JEMALLOC" = "OFF",
+    "ARROW_PARQUET" = "OFF", # depends on thrift
+    "ARROW_DATASET" = "OFF", # depends on parquet
+    "ARROW_S3" = "OFF",
+    "ARROW_WITH_BROTLI" = "OFF",
+    "ARROW_WITH_BZ2" = "OFF",
+    "ARROW_WITH_LZ4" = "OFF",
+    "ARROW_WITH_SNAPPY" = "OFF",
+    "ARROW_WITH_ZLIB" = "OFF",
+    "ARROW_WITH_ZSTD" = "OFF",
+    "ARROW_WITH_RE2" = "OFF",
+    "ARROW_WITH_UTF8PROC" = "OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON" = "OFF",
+    # The syntax to turn off XSIMD is different.
+    # Pull existing value of EXTRA_CMAKE_FLAGS first (must be defined)
+    "EXTRA_CMAKE_FLAGS" = paste(
+      env_var_list[["EXTRA_CMAKE_FLAGS"]],
+      "-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE"
+    )
+  )
+  # Create a new env_var_list, with the values of turn_off set.
+  # replace() also adds new values if they didn't exist before
+  replace(env_var_list, names(turn_off), turn_off)
+}
+
+set_thirdparty_urls <- function(env_var_list) {
+  # This function is run in most typical cases -- when download_ok is TRUE *or*
+  # ARROW_THIRDPARTY_DEPENDENCY_DIR is set. It does *not* check if existing
+  # *_SOURCE_URL variables are set. (It is also run whenever 
ARROW_DEPENDENCY_SOURCE
+  # is "SYSTEM", but doesn't affect the build in that case.)
+  deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  if (deps_dir == "") {
+    return(env_var_list)
+  }
+  files <- list.files(deps_dir, full.names = FALSE)
+  if (length(files) == 0) {
+    # This will be true if the directory doesn't exist, or if it exists but is 
empty.
+    # Here the build will continue, but will likely fail when the downloads are
+    # unavailable. The user will end up with the arrow-without-arrow package.
+    cat(paste0(
+      "*** Error: ARROW_THIRDPARTY_DEPENDENCY_DIR was set but has no files.\n",

Review comment:
       ```suggestion
         "*** Warning: ARROW_THIRDPARTY_DEPENDENCY_DIR was set but has no 
files.\n",
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -52,6 +43,24 @@ try_download <- function(from_url, to_file) {
   !inherits(status, "try-error") && status == 0
 }
 
+build_ok <- !env_is("LIBARROW_BUILD", "false")
+# But binary defaults to not OK
+binary_ok <- !identical(tolower(Sys.getenv("LIBARROW_BINARY", "false")), 
"false")
+# For local debugging, set ARROW_R_DEV=TRUE to make this script print more
+
+quietly <- !env_is("ARROW_R_DEV", "true") # try_download uses quietly global
+# * download_ok, build_ok: Use prebuilt binary, if found, otherwise try to 
build
+# * !download_ok, build_ok: Build with local git checkout, if available, or
+#   sources included in r/tools/cpp/. Optional dependencies are not included,
+#   and will not be automatically downloaded.
+#   cmake will still be downloaded if necessary
+#   https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+# * download_ok, !build_ok: Only use prebuilt binary, if found
+# * neither: Get the arrow-without-arrow package
+# Download and build are OK unless you say not to (or can't access github)
+download_ok <- !env_is("TEST_OFFLINE_BUILD", "true") && 
try_download("https://github.com";, tempfile())
+
+

Review comment:
       ```suggestion
   # For local debugging, set ARROW_R_DEV=TRUE to make this script print more
   quietly <- !env_is("ARROW_R_DEV", "true")
   
   # Default is build from source, not download a binary
   build_ok <- !env_is("LIBARROW_BUILD", "false")
   binary_ok <- !identical(tolower(Sys.getenv("LIBARROW_BINARY", "false")), 
"false")
   
   # Check if we're doing an offline build.
   # (Note that cmake will still be downloaded if necessary
   #  https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds)
   download_ok <- !env_is("TEST_OFFLINE_BUILD", "true") && 
try_download("https://github.com";, tempfile())
   
   ```

##########
File path: r/vignettes/install.Rmd
##########
@@ -343,6 +357,7 @@ By default, these are all unset. All boolean variables are 
case-insensitive.
 * `CMAKE`: When building the C++ library from source, you can specify a
   `/path/to/cmake` to use a different version than whatever is found on the 
`$PATH`
 
+

Review comment:
       ```suggestion
   ```

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package

Review comment:
       Oh, I guess you're also relying on the package installation to deliver 
the download_dependencies.sh and versions.txt scripts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to