Re: [DISCUSS] Making storage-api a separately released artifact

Sergey Shelukhin Fri, 19 Aug 2016 18:52:46 -0700

I am suggesting we always skip the number. So only one component gets the
next one :) In your example Hive trunk would be 2.3, and if SA is released
again it would become 2.4. Otherwise we’d need a compat table cause
versions will be totally out of sync.


On 16/8/19, 16:31, "Owen O'Malley" <omal...@apache.org> wrote:

>That won't necessarily work, especially in the beginning. If we release SA
>2.2.0 and use it for Hive trunk with the assumption that the next Hive
>release will be 2.2. What do we do when we need to make an incompatible
>change in SA? I guess we could release SA as 2.3.0 and when hive makes its
>next release skip over Hive 2.2 and go straight to Hive 2.3.0. In general
>I
>think that we'd be better off with the release numbers not tied together.
>
>.. Owen
>
>On Fri, Aug 19, 2016 at 4:14 PM, Sergey Shelukhin <ser...@hortonworks.com>
>wrote:
>
>> Can we just run the versions thru? I.e. increment it every time but
>> release only one component (or both if they happen to align I guess).
>> E.g. storage-api will be released at 2.2, and say 2.3 if it moves fast,
>> then Hive 2.4, then storage-api 2.5, etc.
>> That might make it easier to reason about compatibility because the
>>order
>> is obvious.
>>
>> On 16/8/19, 09:04, "Sergio Pena" <sergio.p...@cloudera.com> wrote:
>>
>> >I see Parquet is currently using the SearchArgument class for
>>predicates
>> >push down.
>> >Will this class be part of the new sub-module or project?
>> >
>> >Following Sushanth idea, can we have other API interfaces in the new
>> >project that other components can use?
>> >Perhaps having this may be a good reason to create a project.
>> >
>> >I'm -1 with the 4th minor version. As Owen mentioned, changing the 4th
>> >version number for incompatible changes is ugly and confusing.
>> >I like the new project idea more, +1, but  the storage-api may be too
>> >small
>> >for a new project.
>> >
>> >- Sergio
>> >
>> >On Wed, Aug 17, 2016 at 2:05 PM, Owen O'Malley <omal...@apache.org>
>> wrote:
>> >
>> >> On Wed, Aug 17, 2016 at 10:46 AM, Alan Gates <alanfga...@gmail.com>
>> >>wrote:
>> >>
>> >> > +1 for making the API clean and easy for other projects to work
>>with.
>> >> A
>> >> > few questions:
>> >> >
>> >> > 1) Would this also make it easier for Parquet and others to
>>implement
>> >> > Hive’s ACID interfaces?
>> >> >
>> >>
>> >> Currently the ACID interfaces haven't been moved over to storage-api,
>> >> although it would make sense to do so at some point.
>> >>
>> >>
>> >> >
>> >> > 2) Would we make any attempt to coordinate version numbers between
>> >>Hive
>> >> > and the storage module, or would a given version of Hive just
>>depend
>> >>on a
>> >> > given version of the storage module?
>> >> >
>> >>
>> >> The two options that I see are:
>> >>
>> >> * Let the numbers run separately starting from 2.2.0.
>> >> * Tie the numbers together with an additional level of versioning
>>(eg.
>> >> 2.2.0.0).
>> >>
>> >> I think that letting the two version numbers diverge is better in the
>> >>long
>> >> term. For example, if you need to make an incompatible change, it is
>> >>pretty
>> >> ugly to do it as a fourth level version number (eg. an incompatible
>> >>change
>> >> from 2.2.0.0 to 2.2.0.1). At the beginning, I expect that storage-api
>> >>would
>> >> move faster than Hive, but as it stabilizes I expect it might start
>> >>moving
>> >> slower than Hive.
>> >>
>> >> I'd propose that we have Hive's build use a released version of
>> >>storage-api
>> >> rather than a snapshot.
>> >>
>> >> Thoughts?
>> >>
>> >>    Owen
>> >>
>> >>
>> >> > Alan.
>> >> >
>> >> > > On Aug 15, 2016, at 17:01, Owen O'Malley <omal...@apache.org>
>> wrote:
>> >> > >
>> >> > > All,
>> >> > >
>> >> > > As part of moving ORC out of Hive, we pulled all of the
>> >>vectorization
>> >> > > storage and sarg classes into a separate module, which is named
>> >> > > storage-api.  Although it is currently only used by ORC, it
>>could be
>> >> used
>> >> > > by Parquet or Avro if they wanted to make a fast vectorized
>>reader
>> >>that
>> >> > > read directly in to Hive's VectorizedRowBatch without needing a
>> >>shim or
>> >> > > data copy. Note that this is in many ways similar to pulling the
>> >>Arrow
>> >> > > project out of Drill.
>> >> > >
>> >> > > This unfortunately still leaves us with a circular dependency
>> >>between
>> >> > Hive
>> >> > > and ORC. I'd hoped that storage-api wouldn't change that much,
>>but
>> >>that
>> >> > > doesn't seem to be happening. As a result, ORC ends up shipping
>>its
>> >>own
>> >> > > fork of storage-api.
>> >> > >
>> >> > > Although we could make a new project for just the storage-api, I
>> >>think
>> >> it
>> >> > > would be better to make it a subproject of Hive that is released
>> >> > > independently.
>> >> > >
>> >> > > What do others think?
>> >> > >
>> >> > >   Owen
>> >> >
>> >> >
>> >>
>>
>>

Re: [DISCUSS] Making storage-api a separately released artifact

Reply via email to