I just logged HIVE-25995 [1]; this might be another blocker for
4.0.0-alpha-1. If that's not the case feel free to change the priority of
the ticket.

[1] https://issues.apache.org/jira/browse/HIVE-25995

On Tue, Mar 1, 2022 at 7:29 PM Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Hi Sungwoo,
> last time I tried to run TPCDS-based benchmark I stumbled upon a similar
> situation, finally I found that statistics were not computed, so CBO was
> not kicking in, and the automatic retry goes with CBO off which was failing
> for something like 10 queries (subqueries cannot be decorrelated, but also
> some runtime errors).
>
> Making sure that (column) statistics were correctly computed fixed the
> problem.
>
> Can you check if this is the case for you?
>
> HTH,
> Alessandro
>
> On Tue, 1 Mar 2022 at 15:28, POSTECH CT <c...@pl.postech.ac.kr> wrote:
>
> > Hello Hive team,
> >
> > I wonder if anyone in the Hive team has tried the TPC-DS benchmark on
> > the master branch recently.  We occasionally run TPC-DS system tests
> > using the master branch, and the tests don't succeed completely. Here
> > is how our TPC-DS tests proceed.
> >
> > 1. Compile and run Hive on Tez (not Hive-LLAP)
> > 2. Load ORC tables from 1TB TPC-DS raw text data, and compute statistics
> > 3. Run 99 TPC-DS queries which were slightly modified to return
> > varying number of rows (rather than 100 rows)
> > 4. Compare the results against the previous results
> >
> > The previous results were obtained and cross-checked by running Hive
> > 3.1.2 and SparkSQL 2.3/3.2, so we are faily confident about their
> > correctness.
> >
> > For the latest commit in the master branch, step 2 fails. For earlier
> > commits (for example, commits in February 2021), step 3 fails where
> > several queries either fail or return wrong results.
> >
> > We can compile and report the test results in this mailing list, but
> > would like to know if similar results have been reproduced by the Hive
> > team, in order to make sure that we did not make errors in our tests.
> >
> > If it is okay to open a JIRA ticket that only reports failures in the
> > TPC-DS test, we could also perform git bi-sect to locate the commit
> > that begin to generate wrong results.
> >
> > --- Sungwoo Park
> >
> > On Tue, 1 Mar 2022, Zoltan Haindrich wrote:
> >
> > > Hey,
> > >
> > > Great to hear that we are on the same side regarding these things :)
> > >
> > > For around a week now - we have nightly builds for the master branch:
> > > http://ci.hive.apache.org/job/hive-nightly/12/
> > >
> > > I think we have 1 blocker issue:
> > > https://issues.apache.org/jira/browse/HIVE-25665
> > >
> > > I know about one more thing I would rather get fixed before we release
> > it:
> > > https://issues.apache.org/jira/browse/HIVE-25994
> > > The best would be to introduce smoke tests (HIVE-22302) to ensure that
> > > something like this will not happen in the future - but we should
> > probably
> > > start moving forward.
> > >
> > > I think we could call the first iteration of this as "4.0.0-alpha-1" :)
> > >
> > > I've added 4.0.0-alpha-1 as a version - and added the above two ticket
> > to it.
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0-alpha-1
> > >
> > > Are there any more things you guys know which would be needed?
> > >
> > > cheers,
> > > Zoltan
> > >
> > >
> > > On 2/22/22 12:18 PM, Peter Vary wrote:
> > >> I would vote for 4.0.0-alpha-1 or similar for all of the components.
> > >>
> > >> When we have more stable releases I would keep the 4.x.x schema, since
> > >> everyone is familiar with it, and I do not see a really good reason to
> > >> change it.
> > >>
> > >> Thanks,
> > >> Peter
> > >>
> > >>
> > >>> On 2022. Feb 10., at 3:34, Szehon Ho <szehon.apa...@gmail.com>
> wrote:
> > >>>
> > >>> +1 that would be awesome to see Hive master released after so long.
> > >>>
> > >>> Either 4.0 or 4.0.0-alpha-1 makes sense to me, not sure how we would
> > pick
> > >>> any 3.x or calendar date (which could tend to slip and be more
> > >>> confusing?).
> > >>>
> > >>> Thanks in any case to get the ball rolling.
> > >>> Szehon
> > >>>
> > >>> On Wed, Feb 9, 2022 at 4:55 AM Zoltan Haindrich <k...@rxd.hu> wrote:
> > >>>
> > >>>> Hey,
> > >>>>
> > >>>> Thank you guys for chiming in; versioning is for sure something we
> > should
> > >>>> get to some common ground.
> > >>>> Its a triple problem right now; I think we have the following
> things:
> > >>>> * storage-api
> > >>>> ** we have "2.7.3-SNAPSHOT" in the repo
> > >>>> ***
> > >>>>
> >
> https://github.com/apache/hive/blob/0d1cffffc7c5005fe47759298fb35a1c67edc93f/storage-api/pom.xml#L27
> > >>>> ** meanwhile we already have 2.8.1 released to maven central
> > >>>> ***
> > https://mvnrepository.com/artifact/org.apache.hive/hive-storage-api
> > >>>> * standalone-metastore
> > >>>> ** 4.0.0-SNAPSHOT in the repo
> > >>>> ** last release is 3.1.2
> > >>>> * hive
> > >>>> ** 4.0.0-SNAPSHOT in the repo
> > >>>> ** last release is 3.1.2
> > >>>>
> > >>>> Regarding the actual version number I'm not entirely sure where we
> > should
> > >>>> start the numbering - that's why I was referring to it as Hive-X in
> my
> > >>>> first letter.
> > >>>>
> > >>>> I think the key point here would be to start shipping releases
> > regularily
> > >>>> and not the actual version number we will use - I'll kinda open to
> any
> > >>>> versioning scheme which
> > >>>> reflects that this is a newer release than 3.1.2.
> > >>>>
> > >>>> I could imagine the following ones:
> > >>>> (A) start with something less expected; but keep 3 in the prefix to
> > >>>> reflect that this is not yet 4.0
> > >>>>      I can imagine the following numbers:
> > >>>>      3.900.0, 3.901.0, ...
> > >>>>      3.9.0, 3.9.1, ...
> > >>>> (B) start 4.0.0
> > >>>>      4.0.0, 4.1.0, ...
> > >>>> (C) jump to some calendar based version number like 2022.2.9
> > >>>>      trunk based development has pros and cons...making a move like
> > this
> > >>>> irreversibly pledges trunk based development; and makes release
> > branches
> > >>>> hard to introduce
> > >>>> (X) somewhat orthogonal is to (also) use some suffixes
> > >>>>      4.0.0-alpha1, 4.0.0-alpha2, 4.0.0-beta1
> > >>>>      this is probably the most tempting to use - but this versioning
> > >>>> schema with a non-changing MINOR and PATCH number will
> > >>>>      also suggest that the actual software is fully compatible - and
> > only
> > >>>> bugs are being fixed - which will not be true...
> > >>>>
> > >>>> I really like the idea to suffix these releases with alpha or beta -
> > >>>> which
> > >>>> will communicate our level commitment that these are not 100%
> > production
> > >>>> ready artifacts.
> > >>>>
> > >>>> I think we could fix HIVE-25665; and probably experiment with
> > >>>> 4.0.0-alpha1
> > >>>> for start...
> > >>>>
> > >>>>> This also means there should *not* be a branch-4 after releasing
> Hive
> > >>>> 4.0
> > >>>>> and let that diverge (and becomes the next, super-ignored
> branch-3),
> > >>>> correct; no need to keep a branch we don't maintain...but in any
> case
> > I
> > >>>> think we can postpone this decision until there will be something to
> > >>>> release... :)
> > >>>>
> > >>>> cheers,
> > >>>> Zoltan
> > >>>>
> > >>>>
> > >>>>
> > >>>> On 2/9/22 10:23 AM, L?szl? Bodor wrote:
> > >>>>> Hi All!
> > >>>>>
> > >>>>> A purely technical question: what will the SNAPSHOT version become
> > after
> > >>>>> releasing Hive 4.0.0? I think this is important, as it defines and
> > >>>> reflects
> > >>>>> the future release plans.
> > >>>>>
> > >>>>> Currently, it's 4.0.0-SNAPSHOT, I guess it's since Hive 3.0 +
> > branch-3.
> > >>>>> Hive is an evolving and super-active project: if we want to make
> > regular
> > >>>>> releases, we should simply release Hive 4.0 and bump pom to
> > >>>> 4.1.0-SNAPSHOT,
> > >>>>> which clearly says that we can release Hive 4.1 anytime we want,
> > without
> > >>>>> being frustrated about "whether we included enough cool stuff to
> > release
> > >>>>> 5.0".
> > >>>>>
> > >>>>> This also means there should *not* be a branch-4 after releasing
> > Hive
> > >>>>> 4.0
> > >>>>> and let that diverge (and becomes the next, super-ignored
> branch-3),
> > >>>>> only
> > >>>>> when we end up bringing a minor backward-incompatible thing that
> > needs a
> > >>>>> 4.0.x, and when it happens, we'll create *branch-4.0 *on demand.
> For
> > me,
> > >>>> a
> > >>>>> branch called *branch-4.0* doesn't imply either I can expect cool
> > >>>> releases
> > >>>>> in the future from there or the branch is maintained and tries to
> be
> > in
> > >>>>> sync with the *master*.
> > >>>>>
> > >>>>> Regards,
> > >>>>> Laszlo Bodor
> > >>>>>
> > >>>>> Alessandro Solimando <alessandro.solima...@gmail.com> ezt ?rta
> > (id?pont:
> > >>>>> 2022. febr. 8., K, 16:42):
> > >>>>>
> > >>>>>> Hello everyone,
> > >>>>>> thank you for starting this discussion.
> > >>>>>>
> > >>>>>> I agree that releasing the master branch regularly and
> sufficiently
> > >>>> often
> > >>>>>> is welcome and vital for the health of the community.
> > >>>>>>
> > >>>>>> It would be great to hear from others too, especially PMC members
> > and
> > >>>>>> committers, but even simple contributors/followers as myself.
> > >>>>>>
> > >>>>>> Best regards,
> > >>>>>> Alessandro
> > >>>>>>
> > >>>>>> On Wed, 2 Feb 2022 at 12:22, Stamatis Zampetakis <
> zabe...@gmail.com
> > >
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Hello,
> > >>>>>>>
> > >>>>>>> Thanks for starting the discussion Zoltan.
> > >>>>>>>
> > >>>>>>> I strongly believe that it is important to have regular and often
> > >>>>>> releases
> > >>>>>>> otherwise people will create and maintain separate Hive forks.
> > >>>>>>> The latter is not good for the project and the community may lose
> > >>>>>> valuable
> > >>>>>>> members because of it.
> > >>>>>>>
> > >>>>>>> Going forward I fully agree that there is no point bringing up
> > strong
> > >>>>>>> blockers for the next release. For sure there are many backward
> > >>>>>>> incompatible changes and possibly unstable features but unless we
> > get
> > >>>>>>> a
> > >>>>>>> release out it will be difficult to determine what is broken and
> > what
> > >>>>>> needs
> > >>>>>>> to be fixed.
> > >>>>>>>
> > >>>>>>> Due to the big number of changes that are going to appear in the
> > next
> > >>>>>>> version I would suggest using the terms Hive X-alpha, Hive X-beta
> > for
> > >>>> the
> > >>>>>>> first few releases. This will make it clear to the end users that
> > they
> > >>>>>> need
> > >>>>>>> to be careful when upgrading from an older version and it will
> > give us
> > >>>> a
> > >>>>>>> bit more time and freedom to treat issues that the users will
> > likely
> > >>>>>>> discover.
> > >>>>>>>
> > >>>>>>> The only real blocker that we may want to treat is HIVE-25665 [1]
> > but
> > >>>> we
> > >>>>>>> can continue the discussion under that ticket and re-evaluate if
> > >>>>>> necessary,
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Stamatis
> > >>>>>>>
> > >>>>>>> [1] https://issues.apache.org/jira/browse/HIVE-25665
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Tue, Feb 1, 2022 at 5:03 PM Zoltan Haindrich <k...@rxd.hu>
> > wrote:
> > >>>>>>>
> > >>>>>>>> Hey All,
> > >>>>>>>>
> > >>>>>>>> We didn't made a release for a long time now; (3.1.2 was
> released
> > on
> > >>>> 26
> > >>>>>>>> August 2019) - and I think because we didn't made that many
> > branch-3
> > >>>>>>>> releases; not too many fixes
> > >>>>>>>> were ported there - which made that release branch kinda erode
> > away.
> > >>>>>>>>
> > >>>>>>>> We have a lot of new features/changes in the current master.
> > >>>>>>>> I think instead of aiming for big feature-packed releases we
> > should
> > >>>> aim
> > >>>>>>>> for making a regular release every few months - we should make
> > >>>>>>>> regular
> > >>>>>>>> releases which people could
> > >>>>>>>> install and use.
> > >>>>>>>> After all releasing Hive after more than 2 years would be big
> step
> > >>>>>>> forward
> > >>>>>>>> in itself alone - we have so many improvements that I can't even
> > >>>>>> count...
> > >>>>>>>>
> > >>>>>>>> But I may know not every aspects of the project / states of some
> > >>>>>> internal
> > >>>>>>>> features - so I would like to ask you:
> > >>>>>>>> What would be the bare minimum requirements before we could
> > release
> > >>>> the
> > >>>>>>>> current master as Hive X?
> > >>>>>>>>
> > >>>>>>>> There are many nice-to-have-s like:
> > >>>>>>>> * hadoop upgrade
> > >>>>>>>> * jdk11
> > >>>>>>>> * remove HoS or MR
> > >>>>>>>> * ?
> > >>>>>>>> but I don't think these are blockers...we can make any of these
> > in
> > >>>>>>>> the
> > >>>>>>>> next release if we start making them...
> > >>>>>>>>
> > >>>>>>>> cheers,
> > >>>>>>>> Zoltan
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> > >
> >
>

Reply via email to