jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698472411



##########
File path: r/tools/nixlibs.R
##########
@@ -82,7 +91,7 @@ download_binary <- function(os = identify_os()) {
 # * `TRUE` (not case-sensitive), to try to discover your current OS, or
 # * some other string, presumably a related "distro-version" that has binaries
 #   built that work for your OS
-identify_os <- function(os = Sys.getenv("LIBARROW_BINARY", 
Sys.getenv("LIBARROW_DOWNLOAD"))) {
+identify_os <- function(os = Sys.getenv("LIBARROW_BINARY", 
Sys.getenv("TEST_OFFLINE_BUILD"))) {

Review comment:
       If I'm following the logic here correctly, if `LIBARROW_BINARY` is 
unset, this will only attempt to identify the OS when `TEST_OFFLINE_BUILD` is 
`TRUE`. Is that what we want here?

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,64 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access
+#' - Run this function
+#' - Copy the saved dependency files to a computer without internet access
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` 
that
+#'   points to the folder.
+#' - Install the `arrow` package on the computer without internet access
+#' - Run [arrow_info()] to check installed capabilities

Review comment:
       Super minor, but an attempt to clarify which steps happen on which 
machines. We could also make subheadings if the parentheticals are too clunky 
since it's the first two steps on one computer and the rest on the other.
   
   ```suggestion
   #' - Install the `arrow` package (on a computer with internet access)
   #' - Run this function (on a computer with internet access)
   #' - Copy the saved dependency files to the computer without internet access
   #' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` 
that
   #'   points to the folder. (on the computer without internet access)
   #' - Install the `arrow` package (on the computer without internet access)
   #' - Run [arrow_info()] to check installed capabilities (on the computer 
without internet access)
   ```

##########
File path: r/R/util.R
##########
@@ -183,3 +183,63 @@ repeat_value_as_array <- function(object, n) {
   }
   return(Scalar$create(object)$as_array(n))
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#'
+#' @return TRUE/FALSE for whether the downloads were successful
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Other files are not changed.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access

Review comment:
       For the purposes of our CI job, we will have the checkout available so 
could copy them over in that process. But for people trying to install with the 
script, that's an issue. We could attempt to grab those files from github if 
they aren't findable with `system.file()`, but that opens up another can of 
worms to make sure we're grabbing the right version of those files for the 
install to work. 
   
   I don't think it's the end of the world to require the double installation 
until we find a better solution.

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,64 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access
+#' - Run this function
+#' - Copy the saved dependency files to a computer without internet access
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` 
that
+#'   points to the folder.
+#' - Install the `arrow` package on the computer without internet access
+#' - Run [arrow_info()] to check installed capabilities

Review comment:
       That looks great

##########
File path: dev/tasks/tasks.yml
##########
@@ -1033,6 +1033,14 @@ tasks:
       flags: '-e ARROW_SOURCE_HOME="/arrow" -e FORCE_BUNDLED_BUILD=TRUE -e 
LIBARROW_BUILD=TRUE -e ARROW_DEPENDENCY_SOURCE=SYSTEM'
       image: ubuntu-r-only-r
 
+  test-r-offline-minimal:
+      ci: azure
+      template: r/azure.linux.yml
+      params:
+        r_org: rocker
+        r_image: r-base
+        r_tag: latest
+        flags: '-e TEST_OFFLINE_BUILD=true'

Review comment:
       Azure is fine for this one. TBH, I picked Github Actions for the maximal 
build out of convenience since we already have a model that has dependent 
steps. But our CI system (AKA crossbow) is designed to be spread across a 
number of systems like this, so it's totally fine to use two different services 
for these two jobs.

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,66 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#' @param download_dependencies_sh location of the dependency download script,
+#' defaults to the one included with the arrow package.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package
+#' * Run `download_optional_dependencies(my_dependencies)`
+#' * Copy the directory `my-arrow-dependencies` to the computer without 
internet access
+#'
+#' ### On the computer without internet access, use the pre-downloaded 
dependencies:
+#' * Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` 
that
+#'   points to the newly copied `my_dependencies`.
+#' * Install the `arrow` package
+#'   * This installation will build from source, so `cmake` must be available
+#' * Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or 
similar
+#' }
+#' @export
+download_optional_dependencies <- function(
+  deps_dir = Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR"),
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh = system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE

Review comment:
       To start, I've made this an argument to the function so that we can call 
it without installing in CI. We could also do this as an environment variable 
like we do for `deps_dir` (either internally or as an argument here). I don't 
have strong feelings one way or the other, though since this is pretty 
internal-use / CI-use only we might be best off not exposing this as an 
argument at all.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to