jonkeane commented on a change in pull request #10930: URL: https://github.com/apache/arrow/pull/10930#discussion_r700569379
########## File path: r/vignettes/developing.Rmd ########## @@ -40,98 +38,125 @@ set -e set -x ``` -If you're looking to contribute to `arrow`, this document can help you set up a development environment that will enable you to write code and run tests locally. It outlines how to build the various components that make up the Arrow project and R package, as well as some common troubleshooting and workflows developers use. Many contributions can be accomplished with the instructions in [R-only development](#r-only-development). But if you're working on both the C++ library and the R package, the [Developer environment setup](#-developer-environment-setup) section will guide you through setting up a developer environment. +If you're looking to contribute to arrow, this vignette can help you set up a development environment that will enable you to write code and run tests locally. It outlines: + +* how to build the components that make up the Arrow project and R package +* some common troubleshooting and workflows that developers use + +This document is intended only for developers of Apache Arrow or the Arrow R package. Users of the package in R do not need to do any of this setup. If you're looking for how to install Arrow, see [the instructions in the readme](https://arrow.apache.org/docs/r/#installation). + +For clarity, a quick note on terminology used in this vignette: -This document is intended only for developers of Apache Arrow or the Arrow R package. Users of the package in R do not need to do any of this setup. If you're looking for how to install Arrow, see [the instructions in the readme](https://arrow.apache.org/docs/r/#installation); Linux users can find more details on building from source at `vignette("install", package = "arrow")`. +* "Apache Arrow" or "Arrow" - the Apache Arrow project, including implementations in different languages +* "libarrow" or "the Arrow C++ library" - Arrow's C++ library (N.B. this term might not be used in documentation in other parts of the Arrow project) +* "arrow" or "the Arrow R package" - the R package +* `arrow` - a directory into which the Arrow GitHub repo has been checked out Review comment: This is more polish than anything else, but I wonder if this style section would be better as a footnote? It's really good info, but it feels a bit of a sidestep right at the beginning. ########## File path: r/vignettes/developing.Rmd ########## @@ -201,62 +194,53 @@ To enable optional features including: S3 support, an alternative memory allocat Other flags that may be useful: -* `-DBoost_SOURCE=BUNDLED` and `-DThrift_SOURCE=bundled`, for example, or any other dependency `*_SOURCE`, if you have a system version of a C++ dependency that doesn't work correctly with Arrow. This tells the build to compile its own version of the dependency from source. +* `-DBoost_SOURCE=BUNDLED` and `-DThrift_SOURCE=BUNDLED`, for example, or any other dependency `*_SOURCE`, if you have a system version of a C++ dependency that doesn't work correctly with Arrow. This tells the build to compile its own version of the dependency from source. + * `-DCMAKE_BUILD_TYPE=debug` or `-DCMAKE_BUILD_TYPE=relwithdebinfo` can be useful for debugging. You probably don't want to do this generally because a debug build is much slower at runtime than the default `release` build. -_Note_ `cmake` is particularly sensitive to whitespacing, if you see errors, check that you don't have any errant whitespace around +_Note_ `cmake` is particularly sensitive to whitespacing, if you see errors, check that you don't have any errant whitespace. -### Build Arrow +### Step 3 - Building libarrow You can add `-j#` between `make` and `install` here too to speed up compilation by running in parallel (where `#` is the number of cores you have available). ```{bash, save=run & !(sys_install & ubuntu)} make -j8 install ``` -If you are installing on linux, and you are installing to the system, you may -need to use `sudo`: - -```{bash, save=run & sys_install & ubuntu} -sudo make install -``` +### Step 4 - Build the Arrow R package - -### Build the Arrow R package - -Once you’ve built the C++ library, you can install the R package and its +Once you've built libarrow, you can install the R package and its dependencies, along with additional dev dependencies, from the git checkout: ```{bash, save=run} popd # To go back to the root directory of the project, from cpp/build - pushd r R -e 'install.packages("remotes"); remotes::install_deps(dependencies = TRUE)' - R CMD INSTALL . ``` -### Compilation flags +#### Compilation flags If you need to set any compilation flags while building the C++ extensions, you can use the `ARROW_R_CXXFLAGS` environment variable. For example, if you are using `perf` to profile the R extensions, you may need to set -``` shell +```bash export ARROW_R_CXXFLAGS=-fno-omit-frame-pointer ``` -### Developer Experience +#### Recompiling the C++ code -With the setups described here, you should not need to rebuild the Arrow library or even the C++ source in the R package as you iterated and work on the R package. The only time those should need to be rebuilt is if you have changed the C++ in the R package (and even then, `R CMD INSTALL .` should only need to recompile the files that have changed) _or_ if the Arrow library C++ has changed and there is a mismatch between the Arrow Library and the R package. If you find yourself rebuilding either or both each time you install the package or run tests, something is probably wrong with your set up. +With the setup described here, you should not need to rebuild the Arrow library or even the C++ source in the R package as you iterated and work on the R package. The only time those should need to be rebuilt is if you have changed the C++ in the R package (and even then, `R CMD INSTALL .` should only need to recompile the files that have changed) _or_ if the libarrow C++ has changed and there is a mismatch between libarrow and the R package. If you find yourself rebuilding either or both each time you install the package or run tests, something is probably wrong with your set up. Review comment: ```suggestion With the setup described here, you should not need to rebuild the Arrow library or even the C++ source in the R package as you iterate and work on the R package. The only time those should need to be rebuilt is if you have changed the C++ in the R package (and even then, `R CMD INSTALL .` should only need to recompile the files that have changed) _or_ if the libarrow C++ has changed and there is a mismatch between libarrow and the R package. If you find yourself rebuilding either or both each time you install the package or run tests, something is probably wrong with your set up. ``` ########## File path: r/vignettes/developing.Rmd ########## @@ -151,45 +176,13 @@ cmake \ .. ``` -`..` refers to the C++ source directory: we're in `cpp/build`, and the source is in `cpp`. - -#### Configure to install to a system directory - -If you would like to install Arrow as a system library you can do that as well. This is in some respects simpler, but if you already have Arrow libraries installed there, it would disrupt them and possibly require `sudo` permissions. - -Now we can move into the arrow repository to start the build process. You will need to create a directory into which the C++ build will put its contents. It is recommended to make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). And then, change directories to be inside `cpp/build`: - -```{bash, save=run & sys_install} -pushd arrow -mkdir -p cpp/build -pushd cpp/build -``` - -You’ll first call `cmake` to configure the build and then `make install`. For the R package, you’ll need to enable several features in the C++ library using `-D` flags: - -```{bash, save=run & sys_install} -cmake \ - -DARROW_COMPUTE=ON \ - -DARROW_CSV=ON \ - -DARROW_DATASET=ON \ - -DARROW_EXTRA_ERROR_CONTEXT=ON \ - -DARROW_FILESYSTEM=ON \ - -DARROW_INSTALL_NAME_RPATH=OFF \ - -DARROW_JEMALLOC=ON \ - -DARROW_JSON=ON \ - -DARROW_PARQUET=ON \ - -DARROW_WITH_SNAPPY=ON \ - -DARROW_WITH_ZLIB=ON \ - .. -``` - -`..` refers to the C++ source directory: we're in `cpp/build`, and the source is in `cpp`. +`..` refers to the C++ source directory: you're in `cpp/build`, and the source is in `cpp`. Review comment: ```suggestion `..` refers to the C++ source directory: you're in `cpp/build` and the source is in `cpp`. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
