[
https://issues.apache.org/jira/browse/ARROW-12981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358654#comment-17358654
]
Neal Richardson commented on ARROW-12981:
-----------------------------------------
Possibly, though we'd need to think about what our workflow would look like for
building and testing that setup, and it would be good to confirm that the
benefits of doing so outweigh the costs. Open to suggestions.
> Wish: Install source package from CRAN alone
> --------------------------------------------
>
> Key: ARROW-12981
> URL: https://issues.apache.org/jira/browse/ARROW-12981
> Project: Apache Arrow
> Issue Type: Wish
> Components: Packaging, R
> Affects Versions: 4.0.1
> Environment: Linux
> Reporter: Karl Dunkle Werner
> Priority: Major
>
> Hello,
> I would like to install {{Arrow}} on Linux using only CRAN, without
> downloading additional files from Github, Apache, or Ursa Labs. I understand
> this is a big ask, and might not be a priority for you all. Feel free to
> close if you feel that this is out of scope.
> Why is a CRAN-only installation useful?
> # It's common for organizations to set up firewalls that prevent arbitrary
> downloads, but allow access to their own internal CRAN mirror.
> ** Sometimes these firewalls also allow requests to Github, but often not.
> # On a broader level, my favorite thing about R is CRAN, the CRAN
> maintainers, and their
> [policy|https://cran.r-project.org/web/packages/policies.html#Source-packages]
> that "Source packages may not contain any form of binary executable code."
> By distributing most of the Arrow code separately (either as source C++ or a
> compiled library), automated code archives and other source-based tools
> become much less useful.
> Of course, {{arrow}} isn't the only R package to depend on external libraries
> or distribute code separately. If a CRAN-only approach isn't viable, it would
> still be useful to have an all-offline method. I'm also having trouble
> getting an offline install to work, even with a local copy of the Arrow repo.
> (See the bottom of the script below.)
>
> What does does installing offline look like now?
> Here's a bash script that approximates installing behind a firewall.
> {code:sh}
> git clone --depth 1 [email protected]:apache/arrow.git test_arrow
> cd test_arrow
> wget 'https://cran.r-project.org/src/contrib/arrow_4.0.1.tar.gz'
> # Set up a temporary R library (optional)
> mkdir test_r_lib
> export R_LIBS_USER=test_r_lib
> export ARROW_R_DEV=true
> export LIBARROW_MINIMAL=false
> export LIBARROW_DOWNLOAD=false
> export LIBARROW_BINARY=false
> export LIBARROW_BUILD=true
> # These are all of the direct dependencies, including Suggests
> # This isn't required if the packages are already installed
> Rscript -e "install.packages(c('assertthat', 'bit64', 'purrr', 'R6', 'rlang',
> 'tidyselect', 'vctrs', 'cpp11', 'decor', 'distro', 'dplyr', 'hms', 'knitr',
> 'lubridate', 'pkgload', 'reticulate', 'rmarkdown', 'stringr', 'testthat',
> 'tibble', 'withr'))"
> # Disable your internet connection here.
> # Now try to install the R package we downloaded with wget.
> # This is an approximation of being behind a firewall.
> Rscript -e 'install.packages("arrow_4.0.1.tar.gz", repos=NULL)'
> # It successfully installs the R component, but not the C++ library,
> # even with LIBARROW_BUILD=true
> Rscript -e "arrow::arrow_available()"
> # [1] FALSE
> # As mentioned in the installation vignette,
> # we can R CMD INSTALL in the git repo.
> R CMD INSTALL r
> # This will try to build the C++ library, but fails when mimalloc and
> # jemalloc can't be downloaded from Github.
> # (Seems not to be affected by LIBARROW_DOWNLOAD=false).
> # When C++ compilation fails, the R component still installs.
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)