Neal Richardson created ARROW-5686:
--------------------------------------
Summary: [R] Review R Windows CI build
Key: ARROW-5686
URL: https://issues.apache.org/jira/browse/ARROW-5686
Project: Apache Arrow
Issue Type: Improvement
Reporter: Neal Richardson
Assignee: Jeroen
Fix For: 0.14.0
Followup to ARROW-3758 / [https://github.com/apache/arrow/pull/4622]. In that,
I leveraged the tools in [https://github.com/r-windows/rtools-backports] to set
up CI for Arrow C++ and R on Windows using Appveyor. I was guided mainly by the
steps described
[here|https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-BuildingWindowspackages]
on the Arrow project wiki and iterated until I got a passing build.
Despite getting it to "work", I'm certain I've missed some subtleties, and
there may be better ways to accomplish this. Some specific questions:
* I found that I could ignore rtools-backports/ci-library.sh and most of
ci-build.sh because it was oriented around building possibly many packages, but
there was a block of {{pacman}} stuff I did have to copy here:
[https://github.com/apache/arrow/pull/4622/files#diff-f4a8bedb9b0d3fe301a84914916f6d49R22].
I'm not sure how much these are likely to change, but if that's a concern,
maybe that setup could be factored out to a separate shell script in
rtools-backports, and the arrow CI could {{wget}} and {{source}} it like it
does some other resources. That way, our setup here wouldn't diverge.
* I did not understand what I needed to do with rtools-packages, if anything.
It seems that it's not used by R yet, so is it just important to have the
PKGBUILD in place there for when is ready? If I wanted to build both
rtools-backports and rtools-packages builds in the same job, is the difference
only [these environment
variables|https://github.com/r-windows/rtools-backports/blob/master/mingw-w64-arrow/PKGBUILD#L48-L52]?
* The process of taking the appveyor build artifacts, unzipping them, and
merging them into the "rwinlib" directory layout seemed loose and poorly
defined on the wiki, at least as I could tell. I packaged up the process (as I
understood it) in a [shell
script|https://github.com/apache/arrow/pull/4622/files#diff-c043cda9f4ed847b06efeeacf04634ee],
and it produced a zip file that is the right shape (right enough that R could
install the arrow R package with it and run tests). Does that script make
sense? In particular,
** Is there a good way to keep around the other dependencies
(double-conversion, boost, thrift) from when the packages are built so that I
don't have to re-download them from bintray? I see that they get pulled down at
the beginning of each pkgbuild and then removed after, but I don't know where
they are put such that I could keep them around and use them later.
** Is the {{lib}} directory for other dependencies (e.g.
libdouble-conversion.a) and {{lib-4.9.3}} for the arrow and parquet binaries we
build, as the wiki says? Or is {{lib}} for the Rtools4.0/gcc8 versions and
lib-4.9.3 for the Rtools3.5/gcc4 versions?
** libdouble-conversion.a only seems to exist in the rtools-packages Rtools4.0
packages, but that nevertheless works on the R release version. However, if I
used the versions of boost and thrift from the Rtools4.0 bintrays, the R
package did not build (link) correctly.
To be clear, it is not our intention to fork or otherwise avoid the supported
Rtools toolchain that is maintained there; rather, we want to continuously
integrate arrow to avoid breaking things and make it easier to submit updates
to rtools-backports/packages/rwinlib when there's a new arrow release. We want
as much as possible to use the supported tools and workflows and are willing to
contribute to enhancing them, though we recognize that our needs (as a big C++
library under heavy active development) are probably not shared by many other
projects that use rtools-packages et al.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)