I signed up as "Javier Luraschi" with this email, if you could please give me access that would be great. Thanks!
I'm assuming the CRAN documentation would go under: https://cwiki.apache.org/confluence/display/ARROW/Distribution+Packages I'll start adding it when I get access. Yes, I mean https://github.com/apache/arrow/pull/3932. Regarding "The challenge I see is that the development procedure is being commingled with packaging issues.". Yes, I agree! Let me send a PR to fix that as well. If a developer properly sets up the RTools development environment, they should not need to rely or rwinlibs. Regarding "How would you suggest testing release", this would be addressed with the previous comment. As in, there needs to be support from building the RTools binaries locally. I'll work on this and follow up with the PR/JIRA-issue once it's ready. Regarding "Seems like this should be turned into a Crossbow task", right; however, I'm limited in time here. I'll open a Jira issue to get some help from the community. I see this as a nice-to-have and less of a must-have, but I'll certainly add this to the confluent docs. Regarding "If there a way to simulate this environment locally?", yes, this is called "R CMD check --as-cran" I'll add it to the confluent docs as well. Regarding, " let's definitely copy this information into a page on the wiki", for sure. Regarding, "Given how manual the process is right now it seems like there's a solid chance that something will be broken after the 0.13", we need more automation and have maintainers used to building RTool binaries, etc. so year, probably the 0.13 will be rough but we will have to go through this and get better over time, not sure we can automate everything on a first release. Yes, I'll reply to "can you reply on the "Timeline for 0.13 release". I think pending docs and PR to decouple builds from release, this would address most of these concerns, correct? Otherwise, let me know. Regarding, "can you reply on the Timeline for 0.13 release". Replied and yes, I just marked the remaining JIRA issue as required for 1.13. Best, Javier On Mon, Mar 25, 2019 at 1:33 PM Wes McKinney <wesmck...@gmail.com> wrote: > hi Javier, > > Thank you for writing back. > > On Mon, Mar 25, 2019 at 12:41 PM Javier Luraschi <jav...@rstudio.com> > wrote: > > > > Hi Wes, sorry for the delay I haven't been monitoring this DL > proactively. > > Yes, I highly recommend setting up some e-mail filters so anything > with "[R]" in the subject title lands in your inbox. You can also > separate "[jira]" messages with a separate filter; there isn't very > much list traffic if you split off the new issue notifications. > > > > > Please notice that I'm not the expert in this topic, so I'll share as > much > > information > > as I can but others with more expertise should feel free comment as well. > > Please > > also note that some of the restrictions we have are common practices in > > R packages that are out of our control, at least without significant > > investment. > > > > I'll document what I know in this email, but please let me know if there > is > > a wiki > > or a better place to move this documentation into. > > > > Yes, let's definitely stash all of the build and packaging information > on our wiki at > > https://cwiki.apache.org/confluence/display/ARROW > > If you let me know your ASF Confluence username I will give you edit > permissions > > > ## Background > > > > CRAN, The Comprehensive R Archive Network, is the most popular (primary) > > package repo for the R community. You can think of CRAN as Homebrew or > > pip.org. CRAN encourages cross-platform packages to be submitted and to > > ease compilation and testing, provide support to precompile binaries for > OS > > X > > and Windows. We will focus now on Windows specifics from now on. > > > > CRAN and R rely on a set of tools based on Mingw to easily compile > packages > > in Windows, this tools set is known as RTools. Originally, Prof. Brian > > Ripley and > > Duncan Murdoch put this toolset together; however, Jeroen Ooms is it > current > > maintainer. RTools is based on Mingw but from past experience, not > > completely > > interchangeable with the standard Mingw distribution. I'm afraid I don't > > have the > > details but this is mostly related to specific packages, versions and > > compilers > > included in Rtools. It's possible to match a Mingw environment with > RTools > > but > > this is, in general, not a straightforward task. > > It would be good to have some links (on a wiki page) to any additional > information about this. > > > > > A few months ago, I naively tried to accomplish this work myself. As in, > get > > RTools to compile Apache Arrow, how hard can it be? It's hard to explain > > all the caveats in a single mail, but if you are interested, you can read > > my own exploration of possible solutions to this problem in this gist > > writeup [1]. > > > > The outcome of this investigation, at least for me and my limited > knowledge > > was > > to not try to do this on my own by reinventing the wheel; otherwise, this > > would > > have taken months of my own time. The solution was then to find out how > > other > > R packages have solve this problem in the past. > > > > Given the specifics of the RTools toolchain, for complex projects with > > significant > > number of components and dependencies, the best (and maybe only!) way > > to get R packages into CRAN in Windows is to precompile the binaries > outside > > of the CRAN build process. The repo of precompiled packages is called > > rwinlibs [2] and has 75 packages and growing. When compiling in CRAN, > rather > > than building the library, it simply gets downloaded from the rwinlibs > repo. > > > > How then are the rwinlibs libraries build then? All the packages are > built > > through > > an automated build system available under theb rtools-packages [3] repo > > where > > an appveyor script detects changes and builds the appropriate libraries. > > This repo > > runs with the latest RTools toolchain. To support previous versions of > > R/RTools a > > the rtools-backports [4] repo provides backward compatibility in an > > automated way. > > > > So now we can get back at discussing how we want to make this work in the > > arrow project. One way, which this PR encourages is to say "Lets not > worry > > about > > what the R/CRAN publishing process is, they have their own processes and > > tools > > to build binaries for Windows. This is similar to brew formulae, the > > formula that > > builds arrow for OS X using homebrew is in a different repo [5]". > > When you say "this PR" you mean > > https://github.com/apache/arrow/pull/4011 > > or > > https://github.com/apache/arrow/pull/3932 > > The challenge I see is that the development procedure is being > commingled with packaging issues. I would like to see a write-up to > provide instructions for an Arrow developer to create a build of Arrow > on the master branch using mingw/Rtools for the purposes of > development. If we don't have this written down, this is putting us in > a potentially very bad situation where developers cannot debug issues. > I think it's fine if all of the other C++ dependencies are snapshotted > in rwinlibs > > > > > While splitting the release processes into multiple repos has some > > advantages, > > it certainly has some caveats. For instance, when publishing a new > release > > of > > arrow in Homebrew, one needs to manually go an update the Hombrew > formulae. > > > > That said, I would hope that the Homebrew release process is documented > in > > the > > Arrow project in the same way that we should document the R release > process > > in > > the Arrow project. Hopefully this mail helps build a first iteration on > > this. > > > > ## Releasing > > > > These instructions are a bit more pragmatic as to what needs to be done > to > > release > > the R package in CRAN: > > > > (1) Send PR to the rtools-packages [3], increment the version, the repo > > already > > downloads the binaries from the Arrow GitHub project. Ensure that > the > > appveyor > > build succeeds. If the build or tests fails, send the appropriate PR > > to the official > > Arrow repo. > > How would you suggest testing release candidates or otherwise doing > some form of continuous integration / integration testing to ensure we > haven't broken this step? > > > (2) Send PR to the rtools-backports [4], similar to (1) but different > repo. > > Seems like this should be turned into a Crossbow task in this project > (see https://github.com/apache/arrow/tree/master/dev/tasks) so that it > can be maintained by the Arrow community. This is how we are handling > package automation for Linux packages, Python wheels, Gandiva JARs, > etc. This also may enable the integration testing I described above to > take place (though having an Appveyor build would be superior) > > > (3) Copy the output produced by (1) and (2) as a PR to the rwinlib/arrow > > [6] repo. > > (4) Before merging (3) validate that CRAN can build and test using the > new > > library > > using the winbuilder service [7]. This service is maintained to CRAN > > and allows > > you to pre-check a package builds properly under a CRAN-like build > > machine > > for Windows. > > If there a way to simulate this environment locally? > > > (5) Submit package to CRAN, make sure their practices and processes are > > followed [8]. > > > > While I did my best to document the steps, there is certainly more > details > > that can be > > added over time. Regardless, feel free to reach out to me with questions, > > support > > requests and why not and I'll try my best to address them. > > > > OK, let's definitely copy this information into a page on the wiki so > that these steps can be maintained as time goes on. The goal would be > to have sufficient detail to increase the bus factor involved with > post-release tasks. > > Given how manual the process is right now it seems like there's a > solid chance that something will be broken after the 0.13 release is > out. Speaking of which, can you reply on the "Timeline for 0.13 > release" thread about any PRs that need to get merged? Please set the > "Fix Version" so they show up in the list of 0.13 issues > > Thanks, > Wes > > > Best, Javier > > > > [1]: > https://gist.github.com/javierluraschi/2ade2204364a7c20e9c3d95504d12ce5 > > [2]: https://github.com/rwinlib/ > > [3]: https://github.com/r-windows/rtools-packages > > [4]: https://github.com/r-windows/rtools-backports > > [5]: > > > https://github.com/Homebrew/homebrew-core/blob/master/Formula/apache-arrow.rb > > [6]: https://github.com/rwinlib/arrow > > [7]: https://win-builder.r-project.org/ > > [8]: https://cran.r-project.org/submit.html > > > > > > > > > > On Sat, Mar 16, 2019 at 1:10 PM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > hi folks, > > > > > > I have noticed there is work under way to prepare Apache Arrow for > > > submission to the CRAN package manager for R users. I'm slightly > > > concerned about the lack of information and documentation in the > > > project regarding what is involved with this effort. This patch in > > > particular raised some eyebrows > > > > > > https://github.com/apache/arrow/pull/3932 > > > > > > This introduces a dependency into the project on pre-built static > > > libraries based on processes that aren't documented in the project. I > > > see this repository containing these static libraries for the R > > > Windows toolchain, but if I needed to produce them myself I would not > > > know what to do > > > > > > https://github.com/rwinlib/arrow > > > > > > Additionally, in general, if I wanted to build and test Arrow and R > > > from source on Windows, I also would not know what to do. > > > > > > In the Python world, this would be akin to depending on e.g. > > > conda-forge packages for Windows development, but not having any > > > information in the repository about to build Arrow C++ and Python from > > > source on Windows. > > > > > > So I would like to see some transparency / documentation around the > > > scripts and processes involved with this so that we don't end up with > > > a "bus factor" problem where Arrow PMC members are unable to undertake > > > basic maintenance and release management activities. Currently the > > > work that is going on seems opaque to me and as such feels contrary to > > > the Apache Way. > > > > > > I understand that there is some urgency to make the Arrow libraries > > > available to R users, but I want to make sure we are working in a > > > sustainable manner to grow a community of developers who are able to > > > do work on each part of the project. > > > > > > Thanks, > > > Wes > > > >