Added entry for "Updating CRAN packages" here: https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-UpdatingCRANpackages <https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide>
I'm sure we will have to update with more details and as the process change, but it should be a good start if anyone had to update this. However, I also created a page with the same content that now needs to be deleted (I don't have deleted permissions and probably shouldn't): https://cwiki.apache.org/confluence/display/ARROW/Updating+CRAN+packages. If someone could please delete the "Updated CRAN packages" page, that would be great, thank you! On Thu, Mar 28, 2019 at 6:54 PM Wes McKinney <wesmck...@gmail.com> wrote: > thanks Javier, I just gave you edit permissions on the wiki > > On Mon, Mar 25, 2019 at 4:55 PM Javier Luraschi <jav...@rstudio.com> > wrote: > > > > I signed up as "Javier Luraschi" with this email, if you could please > > give me access that would be great. Thanks! > > > > I'm assuming the CRAN documentation would go under: > > https://cwiki.apache.org/confluence/display/ARROW/Distribution+Packages > > I'll start adding it when I get access. > > I think this page is a bit different, it shows "where to find the > packages". Might be a good idea to create an "R developer guide" or > similar under > > > https://cwiki.apache.org/confluence/display/ARROW#ApacheArrowHome-RLibraries > > > > > Yes, I mean https://github.com/apache/arrow/pull/3932. > > > > Regarding "The challenge I see is that the development procedure is being > > commingled with packaging issues.". Yes, I agree! Let me send a PR to fix > > that > > as well. If a developer properly sets up the RTools development > environment, > > they should not need to rely or rwinlibs. > > > > Regarding "How would you suggest testing release", this would be > > addressed with the previous comment. As in, there needs to be support > > from building the RTools binaries locally. I'll work on this and follow > up > > with the > > PR/JIRA-issue once it's ready. > > > > Regarding "Seems like this should be turned into a Crossbow task", right; > > however, I'm limited in time here. I'll open a Jira issue to get some > help > > from > > the community. I see this as a nice-to-have and less of a must-have, but > > I'll > > certainly add this to the confluent docs. > > > > Regarding "If there a way to simulate this environment locally?", yes, > this > > is > > called "R CMD check --as-cran" I'll add it to the confluent docs as well. > > > > Regarding, " let's definitely copy this information into a page on the > > wiki", for > > sure. > > > > Regarding, "Given how manual the process is right now it seems like > > there's a solid chance that something will be broken after the 0.13", we > > need more automation and have maintainers used to building RTool > > binaries, etc. so year, probably the 0.13 will be rough but we will have > > to go through this and get better over time, not sure we can automate > > everything on a first release. > > > > Yes, I'll reply to "can you reply on the "Timeline for 0.13 release". > > > > I think pending docs and PR to decouple builds from release, this would > > address most of these concerns, correct? Otherwise, let me know. > > > > Regarding, "can you reply on the Timeline for 0.13 release". Replied and > > yes, I just marked the remaining JIRA issue as required for 1.13. > > > > Yes, I think so. Given the diversity of the community, I think we > should strive to create a humane, well-documented developer experience > that does not rely on deep institutional knowledge (which can be hard > to come by) to undertake basic workflows. That way it will be easier > for folks less well-versed in nitty-gritty CRAN stuff to be able to at > least build the project from source and test things out > > Thanks > Wes > > > Best, Javier > > > > > > On Mon, Mar 25, 2019 at 1:33 PM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > hi Javier, > > > > > > Thank you for writing back. > > > > > > On Mon, Mar 25, 2019 at 12:41 PM Javier Luraschi <jav...@rstudio.com> > > > wrote: > > > > > > > > Hi Wes, sorry for the delay I haven't been monitoring this DL > > > proactively. > > > > > > Yes, I highly recommend setting up some e-mail filters so anything > > > with "[R]" in the subject title lands in your inbox. You can also > > > separate "[jira]" messages with a separate filter; there isn't very > > > much list traffic if you split off the new issue notifications. > > > > > > > > > > > Please notice that I'm not the expert in this topic, so I'll share as > > > much > > > > information > > > > as I can but others with more expertise should feel free comment as > well. > > > > Please > > > > also note that some of the restrictions we have are common practices > in > > > > R packages that are out of our control, at least without significant > > > > investment. > > > > > > > > I'll document what I know in this email, but please let me know if > there > > > is > > > > a wiki > > > > or a better place to move this documentation into. > > > > > > > > > > Yes, let's definitely stash all of the build and packaging information > > > on our wiki at > > > > > > https://cwiki.apache.org/confluence/display/ARROW > > > > > > If you let me know your ASF Confluence username I will give you edit > > > permissions > > > > > > > ## Background > > > > > > > > CRAN, The Comprehensive R Archive Network, is the most popular > (primary) > > > > package repo for the R community. You can think of CRAN as Homebrew > or > > > > pip.org. CRAN encourages cross-platform packages to be submitted > and to > > > > ease compilation and testing, provide support to precompile binaries > for > > > OS > > > > X > > > > and Windows. We will focus now on Windows specifics from now on. > > > > > > > > CRAN and R rely on a set of tools based on Mingw to easily compile > > > packages > > > > in Windows, this tools set is known as RTools. Originally, Prof. > Brian > > > > Ripley and > > > > Duncan Murdoch put this toolset together; however, Jeroen Ooms is it > > > current > > > > maintainer. RTools is based on Mingw but from past experience, not > > > > completely > > > > interchangeable with the standard Mingw distribution. I'm afraid I > don't > > > > have the > > > > details but this is mostly related to specific packages, versions and > > > > compilers > > > > included in Rtools. It's possible to match a Mingw environment with > > > RTools > > > > but > > > > this is, in general, not a straightforward task. > > > > > > It would be good to have some links (on a wiki page) to any additional > > > information about this. > > > > > > > > > > > A few months ago, I naively tried to accomplish this work myself. As > in, > > > get > > > > RTools to compile Apache Arrow, how hard can it be? It's hard to > explain > > > > all the caveats in a single mail, but if you are interested, you can > read > > > > my own exploration of possible solutions to this problem in this gist > > > > writeup [1]. > > > > > > > > The outcome of this investigation, at least for me and my limited > > > knowledge > > > > was > > > > to not try to do this on my own by reinventing the wheel; otherwise, > this > > > > would > > > > have taken months of my own time. The solution was then to find out > how > > > > other > > > > R packages have solve this problem in the past. > > > > > > > > Given the specifics of the RTools toolchain, for complex projects > with > > > > significant > > > > number of components and dependencies, the best (and maybe only!) way > > > > to get R packages into CRAN in Windows is to precompile the binaries > > > outside > > > > of the CRAN build process. The repo of precompiled packages is called > > > > rwinlibs [2] and has 75 packages and growing. When compiling in CRAN, > > > rather > > > > than building the library, it simply gets downloaded from the > rwinlibs > > > repo. > > > > > > > > How then are the rwinlibs libraries build then? All the packages are > > > built > > > > through > > > > an automated build system available under theb rtools-packages [3] > repo > > > > where > > > > an appveyor script detects changes and builds the appropriate > libraries. > > > > This repo > > > > runs with the latest RTools toolchain. To support previous versions > of > > > > R/RTools a > > > > the rtools-backports [4] repo provides backward compatibility in an > > > > automated way. > > > > > > > > So now we can get back at discussing how we want to make this work > in the > > > > arrow project. One way, which this PR encourages is to say "Lets not > > > worry > > > > about > > > > what the R/CRAN publishing process is, they have their own processes > and > > > > tools > > > > to build binaries for Windows. This is similar to brew formulae, the > > > > formula that > > > > builds arrow for OS X using homebrew is in a different repo [5]". > > > > > > When you say "this PR" you mean > > > > > > https://github.com/apache/arrow/pull/4011 > > > > > > or > > > > > > https://github.com/apache/arrow/pull/3932 > > > > > > The challenge I see is that the development procedure is being > > > commingled with packaging issues. I would like to see a write-up to > > > provide instructions for an Arrow developer to create a build of Arrow > > > on the master branch using mingw/Rtools for the purposes of > > > development. If we don't have this written down, this is putting us in > > > a potentially very bad situation where developers cannot debug issues. > > > I think it's fine if all of the other C++ dependencies are snapshotted > > > in rwinlibs > > > > > > > > > > > While splitting the release processes into multiple repos has some > > > > advantages, > > > > it certainly has some caveats. For instance, when publishing a new > > > release > > > > of > > > > arrow in Homebrew, one needs to manually go an update the Hombrew > > > formulae. > > > > > > > > That said, I would hope that the Homebrew release process is > documented > > > in > > > > the > > > > Arrow project in the same way that we should document the R release > > > process > > > > in > > > > the Arrow project. Hopefully this mail helps build a first iteration > on > > > > this. > > > > > > > > ## Releasing > > > > > > > > These instructions are a bit more pragmatic as to what needs to be > done > > > to > > > > release > > > > the R package in CRAN: > > > > > > > > (1) Send PR to the rtools-packages [3], increment the version, the > repo > > > > already > > > > downloads the binaries from the Arrow GitHub project. Ensure > that > > > the > > > > appveyor > > > > build succeeds. If the build or tests fails, send the > appropriate PR > > > > to the official > > > > Arrow repo. > > > > > > How would you suggest testing release candidates or otherwise doing > > > some form of continuous integration / integration testing to ensure we > > > haven't broken this step? > > > > > > > (2) Send PR to the rtools-backports [4], similar to (1) but different > > > repo. > > > > > > Seems like this should be turned into a Crossbow task in this project > > > (see https://github.com/apache/arrow/tree/master/dev/tasks) so that it > > > can be maintained by the Arrow community. This is how we are handling > > > package automation for Linux packages, Python wheels, Gandiva JARs, > > > etc. This also may enable the integration testing I described above to > > > take place (though having an Appveyor build would be superior) > > > > > > > (3) Copy the output produced by (1) and (2) as a PR to the > rwinlib/arrow > > > > [6] repo. > > > > (4) Before merging (3) validate that CRAN can build and test using > the > > > new > > > > library > > > > using the winbuilder service [7]. This service is maintained to > CRAN > > > > and allows > > > > you to pre-check a package builds properly under a CRAN-like > build > > > > machine > > > > for Windows. > > > > > > If there a way to simulate this environment locally? > > > > > > > (5) Submit package to CRAN, make sure their practices and processes > are > > > > followed [8]. > > > > > > > > While I did my best to document the steps, there is certainly more > > > details > > > > that can be > > > > added over time. Regardless, feel free to reach out to me with > questions, > > > > support > > > > requests and why not and I'll try my best to address them. > > > > > > > > > > OK, let's definitely copy this information into a page on the wiki so > > > that these steps can be maintained as time goes on. The goal would be > > > to have sufficient detail to increase the bus factor involved with > > > post-release tasks. > > > > > > Given how manual the process is right now it seems like there's a > > > solid chance that something will be broken after the 0.13 release is > > > out. Speaking of which, can you reply on the "Timeline for 0.13 > > > release" thread about any PRs that need to get merged? Please set the > > > "Fix Version" so they show up in the list of 0.13 issues > > > > > > Thanks, > > > Wes > > > > > > > Best, Javier > > > > > > > > [1]: > > > > https://gist.github.com/javierluraschi/2ade2204364a7c20e9c3d95504d12ce5 > > > > [2]: https://github.com/rwinlib/ > > > > [3]: https://github.com/r-windows/rtools-packages > > > > [4]: https://github.com/r-windows/rtools-backports > > > > [5]: > > > > > > > > https://github.com/Homebrew/homebrew-core/blob/master/Formula/apache-arrow.rb > > > > [6]: https://github.com/rwinlib/arrow > > > > [7]: https://win-builder.r-project.org/ > > > > [8]: https://cran.r-project.org/submit.html > > > > > > > > > > > > > > > > > > > > On Sat, Mar 16, 2019 at 1:10 PM Wes McKinney <wesmck...@gmail.com> > > > wrote: > > > > > > > > > hi folks, > > > > > > > > > > I have noticed there is work under way to prepare Apache Arrow for > > > > > submission to the CRAN package manager for R users. I'm slightly > > > > > concerned about the lack of information and documentation in the > > > > > project regarding what is involved with this effort. This patch in > > > > > particular raised some eyebrows > > > > > > > > > > https://github.com/apache/arrow/pull/3932 > > > > > > > > > > This introduces a dependency into the project on pre-built static > > > > > libraries based on processes that aren't documented in the > project. I > > > > > see this repository containing these static libraries for the R > > > > > Windows toolchain, but if I needed to produce them myself I would > not > > > > > know what to do > > > > > > > > > > https://github.com/rwinlib/arrow > > > > > > > > > > Additionally, in general, if I wanted to build and test Arrow and R > > > > > from source on Windows, I also would not know what to do. > > > > > > > > > > In the Python world, this would be akin to depending on e.g. > > > > > conda-forge packages for Windows development, but not having any > > > > > information in the repository about to build Arrow C++ and Python > from > > > > > source on Windows. > > > > > > > > > > So I would like to see some transparency / documentation around the > > > > > scripts and processes involved with this so that we don't end up > with > > > > > a "bus factor" problem where Arrow PMC members are unable to > undertake > > > > > basic maintenance and release management activities. Currently the > > > > > work that is going on seems opaque to me and as such feels > contrary to > > > > > the Apache Way. > > > > > > > > > > I understand that there is some urgency to make the Arrow libraries > > > > > available to R users, but I want to make sure we are working in a > > > > > sustainable manner to grow a community of developers who are able > to > > > > > do work on each part of the project. > > > > > > > > > > Thanks, > > > > > Wes > > > > > > > > >