Can we just run the versions thru? I.e. increment it every time but release only one component (or both if they happen to align I guess). E.g. storage-api will be released at 2.2, and say 2.3 if it moves fast, then Hive 2.4, then storage-api 2.5, etc. That might make it easier to reason about compatibility because the order is obvious.
On 16/8/19, 09:04, "Sergio Pena" <sergio.p...@cloudera.com> wrote: >I see Parquet is currently using the SearchArgument class for predicates >push down. >Will this class be part of the new sub-module or project? > >Following Sushanth idea, can we have other API interfaces in the new >project that other components can use? >Perhaps having this may be a good reason to create a project. > >I'm -1 with the 4th minor version. As Owen mentioned, changing the 4th >version number for incompatible changes is ugly and confusing. >I like the new project idea more, +1, but the storage-api may be too >small >for a new project. > >- Sergio > >On Wed, Aug 17, 2016 at 2:05 PM, Owen O'Malley <omal...@apache.org> wrote: > >> On Wed, Aug 17, 2016 at 10:46 AM, Alan Gates <alanfga...@gmail.com> >>wrote: >> >> > +1 for making the API clean and easy for other projects to work with. >> A >> > few questions: >> > >> > 1) Would this also make it easier for Parquet and others to implement >> > Hive’s ACID interfaces? >> > >> >> Currently the ACID interfaces haven't been moved over to storage-api, >> although it would make sense to do so at some point. >> >> >> > >> > 2) Would we make any attempt to coordinate version numbers between >>Hive >> > and the storage module, or would a given version of Hive just depend >>on a >> > given version of the storage module? >> > >> >> The two options that I see are: >> >> * Let the numbers run separately starting from 2.2.0. >> * Tie the numbers together with an additional level of versioning (eg. >> 2.2.0.0). >> >> I think that letting the two version numbers diverge is better in the >>long >> term. For example, if you need to make an incompatible change, it is >>pretty >> ugly to do it as a fourth level version number (eg. an incompatible >>change >> from 2.2.0.0 to 2.2.0.1). At the beginning, I expect that storage-api >>would >> move faster than Hive, but as it stabilizes I expect it might start >>moving >> slower than Hive. >> >> I'd propose that we have Hive's build use a released version of >>storage-api >> rather than a snapshot. >> >> Thoughts? >> >> Owen >> >> >> > Alan. >> > >> > > On Aug 15, 2016, at 17:01, Owen O'Malley <omal...@apache.org> wrote: >> > > >> > > All, >> > > >> > > As part of moving ORC out of Hive, we pulled all of the >>vectorization >> > > storage and sarg classes into a separate module, which is named >> > > storage-api. Although it is currently only used by ORC, it could be >> used >> > > by Parquet or Avro if they wanted to make a fast vectorized reader >>that >> > > read directly in to Hive's VectorizedRowBatch without needing a >>shim or >> > > data copy. Note that this is in many ways similar to pulling the >>Arrow >> > > project out of Drill. >> > > >> > > This unfortunately still leaves us with a circular dependency >>between >> > Hive >> > > and ORC. I'd hoped that storage-api wouldn't change that much, but >>that >> > > doesn't seem to be happening. As a result, ORC ends up shipping its >>own >> > > fork of storage-api. >> > > >> > > Although we could make a new project for just the storage-api, I >>think >> it >> > > would be better to make it a subproject of Hive that is released >> > > independently. >> > > >> > > What do others think? >> > > >> > > Owen >> > >> > >>