[ 
https://issues.apache.org/jira/browse/ARROW-12981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400415#comment-17400415
 ] 

Neal Richardson commented on ARROW-12981:
-----------------------------------------

I was thinking some more about how we might do this. I need to confirm, but I 
think that if we put a copy of {{cpp/}} inside of the {{tools}} directory in 
the package, it will be available at build time, but it shouldn't affect the 
size of the mac/windows binary packages (unlike if we put the source inside of 
{{inst}}). I would then add r/tools/cpp to .gitignore, and add a rsync step to 
copy there to {{make build}}, so that when I prepare the package for release, 
the cpp source would get updated. The find_local_source() function in 
tools/nixlibs.R could then be modified to also check inside tools/cpp for 
source. We would test this path in the ursa-labs/arrow-r-nightly packaging 
builds, though we could also devise a test build in arrow's CI.

There would then be two additional steps required for a full offline build, I 
think:

* Detect whether there is an internet connection at install time, and if not, 
disable any feature that requires a third-party dependency download. So you'd 
get arrow/feather, general conversion to/from R and Python, probably CSV 
reading/writing, but no parquet, dataset, etc.
* Provide a helper to download and bundle up those extra dependency downloads 
(wrapping the existing cpp script)

> [R] Install source package from CRAN alone
> ------------------------------------------
>
>                 Key: ARROW-12981
>                 URL: https://issues.apache.org/jira/browse/ARROW-12981
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: Packaging, R
>    Affects Versions: 4.0.1
>         Environment: Linux
>            Reporter: Karl Dunkle Werner
>            Assignee: Jonathan Keane
>            Priority: Major
>
> Hello,
> I would like to install {{Arrow}} on Linux using only CRAN, without 
> downloading additional files from Github, Apache, or Ursa Labs. I understand 
> this is a big ask, and might not be a priority for you all. Feel free to 
> close if you feel that this is out of scope.
> Why is a CRAN-only installation useful?
>  # It's common for organizations to set up firewalls that prevent arbitrary 
> downloads, but allow access to their own internal CRAN mirror.
>  ** Sometimes these firewalls also allow requests to Github, but often not.
>  # On a broader level, my favorite thing about R is CRAN, the CRAN 
> maintainers, and their 
> [policy|https://cran.r-project.org/web/packages/policies.html#Source-packages]
>  that "Source packages may not contain any form of binary executable code." 
> By distributing most of the Arrow code separately (either as source C++ or a 
> compiled library), automated code archives and other source-based tools 
> become much less useful.
> Of course, {{arrow}} isn't the only R package to depend on external libraries 
> or distribute code separately. If a CRAN-only approach isn't viable, it would 
> still be useful to have an all-offline method. I'm also having trouble 
> getting an offline install to work, even with a local copy of the Arrow repo. 
> (See the bottom of the script below.)
>  
> What does does installing offline look like now?
>  Here's a bash script that approximates installing behind a firewall.
> {code:sh}
> git clone --depth 1 g...@github.com:apache/arrow.git test_arrow
> cd test_arrow
> wget 'https://cran.r-project.org/src/contrib/arrow_4.0.1.tar.gz'
> # Set up a temporary R library (optional)
> mkdir test_r_lib
> export R_LIBS_USER=test_r_lib
> export ARROW_R_DEV=true
> export LIBARROW_MINIMAL=false
> export LIBARROW_DOWNLOAD=false
> export LIBARROW_BINARY=false
> export LIBARROW_BUILD=true
> # These are all of the direct dependencies, including Suggests
> # This isn't required if the packages are already installed
> Rscript -e "install.packages(c('assertthat', 'bit64', 'purrr', 'R6', 'rlang', 
> 'tidyselect', 'vctrs', 'cpp11', 'decor', 'distro', 'dplyr', 'hms', 'knitr', 
> 'lubridate', 'pkgload', 'reticulate', 'rmarkdown', 'stringr', 'testthat', 
> 'tibble', 'withr'))"
> # Disable your internet connection here.
> # Now try to install the R package we downloaded with wget.
> # This is an approximation of being behind a firewall.
> Rscript -e 'install.packages("arrow_4.0.1.tar.gz", repos=NULL)'
> # It successfully installs the R component, but not the C++ library, 
> # even with LIBARROW_BUILD=true
> Rscript -e "arrow::arrow_available()"
> # [1] FALSE
> # As mentioned in the installation vignette, 
> # we can R CMD INSTALL in the git repo.
> R CMD INSTALL r
> # This will try to build the C++ library, but fails when mimalloc and 
> # jemalloc can't be downloaded from Github.
> # (Seems not to be affected by LIBARROW_DOWNLOAD=false).
> # When C++ compilation fails, the R component still installs.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to