Jorge, * in rust, run integration tests against the latest apache/master on every > PR >
I've started to familiarize myself with the archery integration framework over the last few days. Could you clarify for the "archery novices" what exactly ^ this line would mean? Does apache/master refer to the C++ implementation as the "reference implementation", so rust would test against/integrate with it? Or is it the arrow JSON format that needs to be consumed into valid arrow in-memory, then produce the same arrow JSON from in-memory arrow (this seems to be the extent of the go integration tests at least)? Sorry if this easily answerable from knowing archery better, but I'm still in the learning/discovery phase of how exactly all the integration tests are setup/run. -Jacob On Sat, Apr 10, 2021 at 1:03 AM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Hi, > > Wrt to integration tests, I agree that it is important to have a plan prior > to this. > > What we have been doing in the apache/arrow: > > 1. only release if integration tests pass against each other > 2. release the signed tar with the latest of every implementation (i.e. > master) > > My suggestion for independent versioning: > > CI: > > * in rust, run integration tests against the latest apache/master on every > PR > * in apache/arrow, run integration tests against the latest released rust > version > > Release mechanism: > > 1. an arrow crate can only be released if it passes integration tests > against the current latest apache/arrow master > 2. apache/arrow master can release if their integration tests pass against > the latest released rust crate > > The common scenario is that the integration tests in apache/arrow against > Rust pass, and thus > apache/arrow would just need to bundle the latest rust release. > > If tests in apache/arrow fail, then some change in apache/arrow > caused our latest release to stop integrating (since we integration-tested > that version against master prior to our release). > This implies that a current Rust release is out of spec and we thus must > release a patch > asap to correct for this (just like we would need to push a commit to > apache/arrow asap). > Once that patch is released, apache/arrow becomes green again and > apache/arrow can bundle these on the signed apache arrow release. > > In the unlikely event that the latest release is unable to pass integration > tests *and* despite the best efforts Rust is unable to release a patch in > time, we *may* still bundle a previous release of the Rust crate, thereby > not blocking the whole > release (i.e. this allows us to fall back to a previous release without a > mass revert on the apache/arrow repo). > > > * If Rust runs against the latest nightly of Arrow the how will Rust > release without a new Arrow release? > > Not sure if this answers, but Rust does not compile or link against any > implementation, so there are > no ABI contracts. Its "only" contract is the spec (in-memory, IPC, flight, > C data interface, etc). > > A related point is that when we release a Rust version, we can upload > "integration test artifacts" separately (the same binaries that we > currently use in our integration > tests or a docker image with them), that apache/arrow can use to run > integration tests. > This would allow our CI at apache/arrow to download these artifacts and run > tests as usual via archery and CLI, > without having to compile them. This would alleviate some of the challenges > around integration testing whereby every implementation is currently built > on every run and in sequence. > > If someone thinks that it is useful, I would be happy to open a JIRA on > this and draft a google docs > to work out a technical design. > > Best, > Jorge > > > On Sat, Apr 10, 2021 at 1:57 AM Weston Pace <weston.p...@gmail.com> wrote: > > > > I'm assuming the idea is that the existing integration tests will > remain > > in apache/arrow. Will you also run the integration test suites on your > rust > > repository CI checks? > > > > Furthermore, against what version will these tests run? > > > > * If Arrow runs against the latest release of Rust then it will lag > > behind and issues may be detected later. > > * If Arrow runs against the latest nightly of Rust then things will > > get tricky at release time (all Arrow integrations tests pass but Rust > > isn't ready to cut a new release and Arrow tests fail against the > > latest released Rust). > > > > Assuming Rust is also running integration tests against Arrow > > (probably a good idea) you get a similar problem (this one might be > > trickier given the relative frequencies)... > > > > * If Rust runs against the latest release of Arrow then it will lag > > behind (several months). There will be a "catching up" period after > > Arrow releases. > > * If Rust runs against the latest nightly of Arrow the how will Rust > > release without a new Arrow release? > > > > Note, these problems technically exist now with the concept that any > > language can release a patch at any time. Also, since Rust isn't > > directly compiling against other Arrow libs and we are only talking > > about interoperability it's probably not going to be too big of a > > deal. Still, worth giving some thought ahead of time. > > > > On Fri, Apr 9, 2021 at 1:11 PM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > > > > > > > > > With this explanation do you still have a concern? There is no > > suggestion > > > > of making releases that depend on GitHub hashes. > > > > > > No, I don't think so. IIUC you are saying the crates dependency does > not > > > imply the crate artifacts are published elsewhere. This sounds inline > > with > > > policies to me. For some reason I thought the notion of crates implied > > > publishing to Rusts package management system. > > > > > > On Fri, Apr 9, 2021 at 4:07 PM Andy Grove <andygrov...@gmail.com> > wrote: > > > > > > > Hi Micah, > > > > > > > > During development, the Rust crates have local dependencies on each > > other > > > > based on relative file system paths. At release time, we change these > > to > > > > versioned dependencies before publishing, because it isn't possible > to > > > > publish a crate that depends on non-published crates. > > > > > > > > With the code in separate repositories, we would still need an > > equivalent > > > > mechanism for DataFusion to use the Arrow code that is under > > development > > > > but we would point to a GitHub hash rather than a relative path. We > > should > > > > still update to use versioned dependencies when releasing. > > > > > > > > I will revise the text in the document to better explain what this > > means. > > > > > > > > With this explanation do you still have a concern? There is no > > suggestion > > > > of making releases that depend on GitHub hashes. > > > > > > > > Thanks, > > > > > > > > Andy. > > > > > > > > > > > > > > > > On Fri, Apr 9, 2021 at 4:57 PM Micah Kornfield < > emkornfi...@gmail.com> > > > > wrote: > > > > > > > >> > > > > >> > " Crates can depend on GitHub commit hashes between releases" > > > >> > > > >> > > > >> This sounds like it might not align with ASF release policies [1]. > > > >> > > > >> [1] > > https://www.apache.org/legal/release-policy.html#release-definition > > > >> > > > >> On Fri, Apr 9, 2021 at 1:34 PM Neal Richardson < > > > >> neal.p.richard...@gmail.com> > > > >> wrote: > > > >> > > > >> > Thanks, Andy. Two areas of concern I think we should have some > > answer > > > >> for > > > >> > before going forward with this (and I make no opinions as to what > > the > > > >> > "right" answers are, just raising them for discussion): > > > >> > > > > >> > 1. Integration testing: what is our workflow for ensuring that our > > > >> > implementations are integration tested, and what do we do when > > changes > > > >> > (whether in apache/arrow or in apache/arrow-rs) introduce > > > >> > regressions/failures? I'm assuming the idea is that the existing > > > >> > integration tests will remain in apache/arrow. Will you also run > the > > > >> > integration test suites on your rust repository CI checks? > > > >> > 2. Versioning: one rationale from our current policy of "everyone > > > >> releases > > > >> > together" is that you don't have to guess as much whether (for > > example) > > > >> > Arrow Java 3.0 and Arrow Rust 3.0 are compatible and using the > same > > > >> format. > > > >> > It's kind of a heuristic for what library versions were > integration > > > >> tested > > > >> > with each other. It sounds like (but maybe I misunderstand) that > > y'all > > > >> are > > > >> > looking to break from that. But if Arrow C++ goes to version 7.0 > by > > the > > > >> end > > > >> > of the year and arrow-rs chooses to go to 15.4, or 3.12, or > > whatever, > > > >> does > > > >> > that create confusion or doubt that works against the Arrow goal > of > > easy > > > >> > interoperability? > > > >> > > > > >> > Neal > > > >> > > > > >> > On Fri, Apr 9, 2021 at 8:18 AM Andy Grove <andygrov...@gmail.com> > > > >> wrote: > > > >> > > > > >> > > Following on from the email thread "Rust sync meeting" I would > > like to > > > >> > > start a new discussion about moving the Rust components out to > new > > > >> GitHub > > > >> > > repositories and using a new process for issues and release > > > >> management. > > > >> > > > > > >> > > I have started a Google document [1] with details and to track > the > > > >> work > > > >> > > required for this effort but I will summarize the key points of > > the > > > >> > > proposal here: > > > >> > > > > > >> > > > > > >> > > - > > > >> > > > > > >> > > Move existing Rust code into two new repositories > > > >> > > - > > > >> > > > > > >> > > apache/arrow-rs > > > >> > > - > > > >> > > > > > >> > > Arrow + Parquet crates > > > >> > > - > > > >> > > > > > >> > > apache/datafusion > > > >> > > - > > > >> > > > > > >> > > DataFusion + Ballista crates (which are expected to > > merge to > > > >> > some > > > >> > > degree over time) > > > >> > > - > > > >> > > > > > >> > > TPC-H benchmarks > > > >> > > - > > > >> > > > > > >> > > Use GitHub issues for issue tracking > > > >> > > - > > > >> > > > > > >> > > Decouple release process > > > >> > > - > > > >> > > > > > >> > > Crates are released individually > > > >> > > - > > > >> > > > > > >> > > A vote on the source release of the released crate is held > > over > > > >> the > > > >> > > mailing list as usual. > > > >> > > - > > > >> > > > > > >> > > Rust does not need to release a new version when the rest > of > > > >> Arrow > > > >> > > releases; we bundle our latest released crates to the > signed > > > >> tar. > > > >> > > - > > > >> > > > > > >> > > Crates can depend on GitHub commit hashes between releases > > > >> > > > > > >> > > > > > >> > > The Google document may be the best place to collaborate on the > > > >> proposal > > > >> > > but I can update the document based on any comments in this > email > > > >> thread > > > >> > as > > > >> > > well. > > > >> > > > > > >> > > Note that I have excluded discussion about arrow2/parquet2 from > > this > > > >> > > proposal and I believe we should discuss that separately as a > > > >> follow-on > > > >> > > discussion. > > > >> > > > > > >> > > I look forward to hearing opinions on this both from current > Rust > > > >> > > maintainers and contributors and also from the wider Arrow > > community. > > > >> > > > > > >> > > Thanks, > > > >> > > > > > >> > > Andy. > > > >> > > > > > >> > > [1] > > > >> > > > > > >> > > > > > >> > > > > >> > > > https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit?usp=sharing > > > >> > > > > > >> > > > > >> > > > > > > >