rolweber opened a new issue #10226:
URL: https://github.com/apache/arrow/issues/10226


   Hello,
   
   I'm building container images with conda environments that include both 
pyarrow, and R arrow from CRAN. The builds were stable for several weeks. Then 
there were problems two weeks ago, which eventually resolved (see 
[ARROW-12502](https://issues.apache.org/jira/browse/ARROW-12502) in JIRA). 
Today, builds are breaking again, not related to the previous problem afaict. 
I'm looking for advice to
   1. Get the builds working again.
   2. Make the installation less fragile for the future.
   
   On Linux x86, I'm installing first pyarrow 3.0.0 from PyPI, which uses a 
wheel with pre-built native libs. Then I'm installing Arrow 3.0.0 from CRAN, 
which tries to build its own native libs, I think. Then I'm running a few unit 
tests to make sure that both pyarrow and R arrow are working, and can exchange 
data. When there are problems with installing R from CRAN, the build doesn't 
necessarily fail at that step, but only later in the unit tests. That's what's 
happening today.
   
   I've set ARROW_R_DEV=true to get some information about the installation 
problem, as the default output doesn't even print an error message. This is the 
problem today (builds were still working last Friday):
   ```txt
   -- thrift_ep configure command succeeded.  See also 
/tmp/RtmpE3CJCi/file19d083c11a8/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure-*.log
   [ 50%] Performing build step for 'thrift_ep'
   -- thrift_ep build command succeeded.  See also 
/tmp/RtmpE3CJCi/file19d083c11a8/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build-*.log
   [ 51%] Performing install step for 'thrift_ep'
   CMake Error at 
/tmp/RtmpE3CJCi/file19d083c11a8/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-install-RELEASE.cmake:37
 (message):
     Command failed: 2
      'make' 'install'
     See also
       
/tmp/RtmpE3CJCi/file19d083c11a8/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-install-*.log
   -- stdout output is:
   -- stderr output is:
   make[3]: *** No rule to make target 'install'.  Stop.
   CMake Error at 
/tmp/RtmpE3CJCi/file19d083c11a8/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-install-RELEASE.cmake:47
 (message):
     Stopping after outputting logs.
   make[2]: *** [CMakeFiles/thrift_ep.dir/build.make:93: 
thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-install] Error 1
   make[1]: *** [CMakeFiles/Makefile2:758: CMakeFiles/thrift_ep.dir/all] Error 2
   gmake: *** [Makefile:160: all] Error 2
   + popd
   /tmp/RtmpFiYDeK/R.INSTALL19b11d5af287/arrow
   ------------------------- NOTE ---------------------------
   See https://arrow.apache.org/docs/r/articles/install.html
   for help installing Arrow C++ libraries
   ```
   
   I'm also building images for Linux on Power (ppc64le). There, I couldn't 
install pyarrow from PyPI, because there are no wheels for that platform, and 
the source compilation failed. I eventually built a custom version of conda 
packages pyarrow and arrow-cpp. Then I'm installing Arrow from CRAN. This is 
still working today. I've enabled debug output here as well, to compare with 
x86. This is what I see there:
   ```txt
   inst/build_arrow_static.sh: line 54: 
/tmp/RtmpR54GjL/file28b55b5e1923/cmake-3.19.2-Linux-x86_64/bin/cmake: cannot 
execute binary file: Exec format error
   + /tmp/RtmpR54GjL/file28b55b5e1923/cmake-3.19.2-Linux-x86_64/bin/cmake 
--build . --target install
   inst/build_arrow_static.sh: line 84: 
/tmp/RtmpR54GjL/file28b55b5e1923/cmake-3.19.2-Linux-x86_64/bin/cmake: cannot 
execute binary file: Exec format error
   + popd
   /tmp/RtmpfCx9q8/R.INSTALL28965204fdde/arrow
   
PKG_CFLAGS=-I/tmp/RtmpfCx9q8/R.INSTALL28965204fdde/arrow/libarrow/arrow-3.0.0/include
  -DARROW_R_WITH_ARROW
   PKG_LIBS=-larrow_dataset -lparquet -larrow
   ** libs
   ```
   So `cmake` is not even running on that platform, yet I get Docker images 
that work and pass the unit tests. Apparently, the native libs from the 
arrow-cpp conda package are found automatically, and satisfy whatever the R 
installation needs to compile its `arrow.so` library.
   
   Ideally, I'd want the native libs from the PyPI wheel to be used by R arrow 
on the x86 platform. But symlinking the files into the lib directory where 
arrow-cpp puts them on ppc64le didn't do the trick.
   
   Is there a way to tell R Arrow to use existing libs, and where those libs 
are located?
   If I have to install from a source tarball instead of CRAN, that would be 
OK. I'm more concerned about robustness than comfort.
   
   I'll try to collect more information about the thrift_ep problem. Because 
the installation does not fail, temporary files get removed. Maybe the problem 
even auto-resolves in a day or two, just as suddenly as it has appeared. But 
I'm afraid this wasn't the last time that something breaks during the R arrow 
installation, so I'd prefer to reduce the number of things that need to be 
downloaded and compiled at installation time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to