[ 
https://issues.apache.org/jira/browse/ARROW-12981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402683#comment-17402683
 ] 

Karl Dunkle Werner commented on ARROW-12981:
--------------------------------------------

That sounds really promising! If you'd like, I can take a stab at a PR.

 
A couple miscellaneous questions:
* Should the helper you mentioned be an R function (wrapping the existing cpp 
script) included in the package? Or did you have something else in mind? 
* Currently if {{LIBARROW_BINARY}} is {{false}} and {{LIBARROW_DOWNLOAD}} is 
{{true}}, the package will download the C++ code from the apache.org server and 
compile it. Should it just use the included {{tools/cpp}} code instead?
 * One way to write the tests would be to have an environment variable that 
prevents the code from attempting downloads (as opposed to actually blocking 
the R session's access to the internet). I think {{LIBARROW_DOWNLOAD}} already 
prevents downloading for the C++ library, but not the dependencies. Can I 
change the scope of {{LIBARROW_DOWNLOAD=false}} to also block downloading 
dependencies? Or would it make sense to add another variable?


h5. If I'm thinking about this correctly, there will be two possible user 
workflows.

Let me know if you had something else in mind!

h6. Option 1:
1. Install the {{arrow}} package from (a local mirror of) CRAN
  * this will include a feature-light version of the C++ library
  * the install process will automatically detect that it can't download the 
C++ library and extra dependencies aren't available

2. Use the package with some features missing


h6. Option 2:
1. Install the R package somewhere with internet access
2. Run the helper function to download the extra dependencies
3. Copy the dependencies to the firewalled environment
4. Set the appropriate environment variables to use the downloaded dependencies
5. Install the package
  * this will build the C++ library from {{tools/cpp}} and the downloaded 
dependencies

6. Use the package with all desired features


> [R] Install source package from CRAN alone
> ------------------------------------------
>
>                 Key: ARROW-12981
>                 URL: https://issues.apache.org/jira/browse/ARROW-12981
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: Packaging, R
>    Affects Versions: 4.0.1
>         Environment: Linux
>            Reporter: Karl Dunkle Werner
>            Assignee: Neal Richardson
>            Priority: Major
>             Fix For: 6.0.0
>
>
> Hello,
> I would like to install {{Arrow}} on Linux using only CRAN, without 
> downloading additional files from Github, Apache, or Ursa Labs. I understand 
> this is a big ask, and might not be a priority for you all. Feel free to 
> close if you feel that this is out of scope.
> Why is a CRAN-only installation useful?
>  # It's common for organizations to set up firewalls that prevent arbitrary 
> downloads, but allow access to their own internal CRAN mirror.
>  ** Sometimes these firewalls also allow requests to Github, but often not.
>  # On a broader level, my favorite thing about R is CRAN, the CRAN 
> maintainers, and their 
> [policy|https://cran.r-project.org/web/packages/policies.html#Source-packages]
>  that "Source packages may not contain any form of binary executable code." 
> By distributing most of the Arrow code separately (either as source C++ or a 
> compiled library), automated code archives and other source-based tools 
> become much less useful.
> Of course, {{arrow}} isn't the only R package to depend on external libraries 
> or distribute code separately. If a CRAN-only approach isn't viable, it would 
> still be useful to have an all-offline method. I'm also having trouble 
> getting an offline install to work, even with a local copy of the Arrow repo. 
> (See the bottom of the script below.)
>  
> What does does installing offline look like now?
>  Here's a bash script that approximates installing behind a firewall.
> {code:sh}
> git clone --depth 1 [email protected]:apache/arrow.git test_arrow
> cd test_arrow
> wget 'https://cran.r-project.org/src/contrib/arrow_4.0.1.tar.gz'
> # Set up a temporary R library (optional)
> mkdir test_r_lib
> export R_LIBS_USER=test_r_lib
> export ARROW_R_DEV=true
> export LIBARROW_MINIMAL=false
> export LIBARROW_DOWNLOAD=false
> export LIBARROW_BINARY=false
> export LIBARROW_BUILD=true
> # These are all of the direct dependencies, including Suggests
> # This isn't required if the packages are already installed
> Rscript -e "install.packages(c('assertthat', 'bit64', 'purrr', 'R6', 'rlang', 
> 'tidyselect', 'vctrs', 'cpp11', 'decor', 'distro', 'dplyr', 'hms', 'knitr', 
> 'lubridate', 'pkgload', 'reticulate', 'rmarkdown', 'stringr', 'testthat', 
> 'tibble', 'withr'))"
> # Disable your internet connection here.
> # Now try to install the R package we downloaded with wget.
> # This is an approximation of being behind a firewall.
> Rscript -e 'install.packages("arrow_4.0.1.tar.gz", repos=NULL)'
> # It successfully installs the R component, but not the C++ library, 
> # even with LIBARROW_BUILD=true
> Rscript -e "arrow::arrow_available()"
> # [1] FALSE
> # As mentioned in the installation vignette, 
> # we can R CMD INSTALL in the git repo.
> R CMD INSTALL r
> # This will try to build the C++ library, but fails when mimalloc and 
> # jemalloc can't be downloaded from Github.
> # (Seems not to be affected by LIBARROW_DOWNLOAD=false).
> # When C++ compilation fails, the R component still installs.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to