I just logged HIVE-25995 [1]; this might be another blocker for 4.0.0-alpha-1. If that's not the case feel free to change the priority of the ticket.
[1] https://issues.apache.org/jira/browse/HIVE-25995 On Tue, Mar 1, 2022 at 7:29 PM Alessandro Solimando < alessandro.solima...@gmail.com> wrote: > Hi Sungwoo, > last time I tried to run TPCDS-based benchmark I stumbled upon a similar > situation, finally I found that statistics were not computed, so CBO was > not kicking in, and the automatic retry goes with CBO off which was failing > for something like 10 queries (subqueries cannot be decorrelated, but also > some runtime errors). > > Making sure that (column) statistics were correctly computed fixed the > problem. > > Can you check if this is the case for you? > > HTH, > Alessandro > > On Tue, 1 Mar 2022 at 15:28, POSTECH CT <c...@pl.postech.ac.kr> wrote: > > > Hello Hive team, > > > > I wonder if anyone in the Hive team has tried the TPC-DS benchmark on > > the master branch recently. We occasionally run TPC-DS system tests > > using the master branch, and the tests don't succeed completely. Here > > is how our TPC-DS tests proceed. > > > > 1. Compile and run Hive on Tez (not Hive-LLAP) > > 2. Load ORC tables from 1TB TPC-DS raw text data, and compute statistics > > 3. Run 99 TPC-DS queries which were slightly modified to return > > varying number of rows (rather than 100 rows) > > 4. Compare the results against the previous results > > > > The previous results were obtained and cross-checked by running Hive > > 3.1.2 and SparkSQL 2.3/3.2, so we are faily confident about their > > correctness. > > > > For the latest commit in the master branch, step 2 fails. For earlier > > commits (for example, commits in February 2021), step 3 fails where > > several queries either fail or return wrong results. > > > > We can compile and report the test results in this mailing list, but > > would like to know if similar results have been reproduced by the Hive > > team, in order to make sure that we did not make errors in our tests. > > > > If it is okay to open a JIRA ticket that only reports failures in the > > TPC-DS test, we could also perform git bi-sect to locate the commit > > that begin to generate wrong results. > > > > --- Sungwoo Park > > > > On Tue, 1 Mar 2022, Zoltan Haindrich wrote: > > > > > Hey, > > > > > > Great to hear that we are on the same side regarding these things :) > > > > > > For around a week now - we have nightly builds for the master branch: > > > http://ci.hive.apache.org/job/hive-nightly/12/ > > > > > > I think we have 1 blocker issue: > > > https://issues.apache.org/jira/browse/HIVE-25665 > > > > > > I know about one more thing I would rather get fixed before we release > > it: > > > https://issues.apache.org/jira/browse/HIVE-25994 > > > The best would be to introduce smoke tests (HIVE-22302) to ensure that > > > something like this will not happen in the future - but we should > > probably > > > start moving forward. > > > > > > I think we could call the first iteration of this as "4.0.0-alpha-1" :) > > > > > > I've added 4.0.0-alpha-1 as a version - and added the above two ticket > > to it. > > > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0-alpha-1 > > > > > > Are there any more things you guys know which would be needed? > > > > > > cheers, > > > Zoltan > > > > > > > > > On 2/22/22 12:18 PM, Peter Vary wrote: > > >> I would vote for 4.0.0-alpha-1 or similar for all of the components. > > >> > > >> When we have more stable releases I would keep the 4.x.x schema, since > > >> everyone is familiar with it, and I do not see a really good reason to > > >> change it. > > >> > > >> Thanks, > > >> Peter > > >> > > >> > > >>> On 2022. Feb 10., at 3:34, Szehon Ho <szehon.apa...@gmail.com> > wrote: > > >>> > > >>> +1 that would be awesome to see Hive master released after so long. > > >>> > > >>> Either 4.0 or 4.0.0-alpha-1 makes sense to me, not sure how we would > > pick > > >>> any 3.x or calendar date (which could tend to slip and be more > > >>> confusing?). > > >>> > > >>> Thanks in any case to get the ball rolling. > > >>> Szehon > > >>> > > >>> On Wed, Feb 9, 2022 at 4:55 AM Zoltan Haindrich <k...@rxd.hu> wrote: > > >>> > > >>>> Hey, > > >>>> > > >>>> Thank you guys for chiming in; versioning is for sure something we > > should > > >>>> get to some common ground. > > >>>> Its a triple problem right now; I think we have the following > things: > > >>>> * storage-api > > >>>> ** we have "2.7.3-SNAPSHOT" in the repo > > >>>> *** > > >>>> > > > https://github.com/apache/hive/blob/0d1cffffc7c5005fe47759298fb35a1c67edc93f/storage-api/pom.xml#L27 > > >>>> ** meanwhile we already have 2.8.1 released to maven central > > >>>> *** > > https://mvnrepository.com/artifact/org.apache.hive/hive-storage-api > > >>>> * standalone-metastore > > >>>> ** 4.0.0-SNAPSHOT in the repo > > >>>> ** last release is 3.1.2 > > >>>> * hive > > >>>> ** 4.0.0-SNAPSHOT in the repo > > >>>> ** last release is 3.1.2 > > >>>> > > >>>> Regarding the actual version number I'm not entirely sure where we > > should > > >>>> start the numbering - that's why I was referring to it as Hive-X in > my > > >>>> first letter. > > >>>> > > >>>> I think the key point here would be to start shipping releases > > regularily > > >>>> and not the actual version number we will use - I'll kinda open to > any > > >>>> versioning scheme which > > >>>> reflects that this is a newer release than 3.1.2. > > >>>> > > >>>> I could imagine the following ones: > > >>>> (A) start with something less expected; but keep 3 in the prefix to > > >>>> reflect that this is not yet 4.0 > > >>>> I can imagine the following numbers: > > >>>> 3.900.0, 3.901.0, ... > > >>>> 3.9.0, 3.9.1, ... > > >>>> (B) start 4.0.0 > > >>>> 4.0.0, 4.1.0, ... > > >>>> (C) jump to some calendar based version number like 2022.2.9 > > >>>> trunk based development has pros and cons...making a move like > > this > > >>>> irreversibly pledges trunk based development; and makes release > > branches > > >>>> hard to introduce > > >>>> (X) somewhat orthogonal is to (also) use some suffixes > > >>>> 4.0.0-alpha1, 4.0.0-alpha2, 4.0.0-beta1 > > >>>> this is probably the most tempting to use - but this versioning > > >>>> schema with a non-changing MINOR and PATCH number will > > >>>> also suggest that the actual software is fully compatible - and > > only > > >>>> bugs are being fixed - which will not be true... > > >>>> > > >>>> I really like the idea to suffix these releases with alpha or beta - > > >>>> which > > >>>> will communicate our level commitment that these are not 100% > > production > > >>>> ready artifacts. > > >>>> > > >>>> I think we could fix HIVE-25665; and probably experiment with > > >>>> 4.0.0-alpha1 > > >>>> for start... > > >>>> > > >>>>> This also means there should *not* be a branch-4 after releasing > Hive > > >>>> 4.0 > > >>>>> and let that diverge (and becomes the next, super-ignored > branch-3), > > >>>> correct; no need to keep a branch we don't maintain...but in any > case > > I > > >>>> think we can postpone this decision until there will be something to > > >>>> release... :) > > >>>> > > >>>> cheers, > > >>>> Zoltan > > >>>> > > >>>> > > >>>> > > >>>> On 2/9/22 10:23 AM, L?szl? Bodor wrote: > > >>>>> Hi All! > > >>>>> > > >>>>> A purely technical question: what will the SNAPSHOT version become > > after > > >>>>> releasing Hive 4.0.0? I think this is important, as it defines and > > >>>> reflects > > >>>>> the future release plans. > > >>>>> > > >>>>> Currently, it's 4.0.0-SNAPSHOT, I guess it's since Hive 3.0 + > > branch-3. > > >>>>> Hive is an evolving and super-active project: if we want to make > > regular > > >>>>> releases, we should simply release Hive 4.0 and bump pom to > > >>>> 4.1.0-SNAPSHOT, > > >>>>> which clearly says that we can release Hive 4.1 anytime we want, > > without > > >>>>> being frustrated about "whether we included enough cool stuff to > > release > > >>>>> 5.0". > > >>>>> > > >>>>> This also means there should *not* be a branch-4 after releasing > > Hive > > >>>>> 4.0 > > >>>>> and let that diverge (and becomes the next, super-ignored > branch-3), > > >>>>> only > > >>>>> when we end up bringing a minor backward-incompatible thing that > > needs a > > >>>>> 4.0.x, and when it happens, we'll create *branch-4.0 *on demand. > For > > me, > > >>>> a > > >>>>> branch called *branch-4.0* doesn't imply either I can expect cool > > >>>> releases > > >>>>> in the future from there or the branch is maintained and tries to > be > > in > > >>>>> sync with the *master*. > > >>>>> > > >>>>> Regards, > > >>>>> Laszlo Bodor > > >>>>> > > >>>>> Alessandro Solimando <alessandro.solima...@gmail.com> ezt ?rta > > (id?pont: > > >>>>> 2022. febr. 8., K, 16:42): > > >>>>> > > >>>>>> Hello everyone, > > >>>>>> thank you for starting this discussion. > > >>>>>> > > >>>>>> I agree that releasing the master branch regularly and > sufficiently > > >>>> often > > >>>>>> is welcome and vital for the health of the community. > > >>>>>> > > >>>>>> It would be great to hear from others too, especially PMC members > > and > > >>>>>> committers, but even simple contributors/followers as myself. > > >>>>>> > > >>>>>> Best regards, > > >>>>>> Alessandro > > >>>>>> > > >>>>>> On Wed, 2 Feb 2022 at 12:22, Stamatis Zampetakis < > zabe...@gmail.com > > > > > >>>>>> wrote: > > >>>>>> > > >>>>>>> Hello, > > >>>>>>> > > >>>>>>> Thanks for starting the discussion Zoltan. > > >>>>>>> > > >>>>>>> I strongly believe that it is important to have regular and often > > >>>>>> releases > > >>>>>>> otherwise people will create and maintain separate Hive forks. > > >>>>>>> The latter is not good for the project and the community may lose > > >>>>>> valuable > > >>>>>>> members because of it. > > >>>>>>> > > >>>>>>> Going forward I fully agree that there is no point bringing up > > strong > > >>>>>>> blockers for the next release. For sure there are many backward > > >>>>>>> incompatible changes and possibly unstable features but unless we > > get > > >>>>>>> a > > >>>>>>> release out it will be difficult to determine what is broken and > > what > > >>>>>> needs > > >>>>>>> to be fixed. > > >>>>>>> > > >>>>>>> Due to the big number of changes that are going to appear in the > > next > > >>>>>>> version I would suggest using the terms Hive X-alpha, Hive X-beta > > for > > >>>> the > > >>>>>>> first few releases. This will make it clear to the end users that > > they > > >>>>>> need > > >>>>>>> to be careful when upgrading from an older version and it will > > give us > > >>>> a > > >>>>>>> bit more time and freedom to treat issues that the users will > > likely > > >>>>>>> discover. > > >>>>>>> > > >>>>>>> The only real blocker that we may want to treat is HIVE-25665 [1] > > but > > >>>> we > > >>>>>>> can continue the discussion under that ticket and re-evaluate if > > >>>>>> necessary, > > >>>>>>> > > >>>>>>> Best, > > >>>>>>> Stamatis > > >>>>>>> > > >>>>>>> [1] https://issues.apache.org/jira/browse/HIVE-25665 > > >>>>>>> > > >>>>>>> > > >>>>>>> On Tue, Feb 1, 2022 at 5:03 PM Zoltan Haindrich <k...@rxd.hu> > > wrote: > > >>>>>>> > > >>>>>>>> Hey All, > > >>>>>>>> > > >>>>>>>> We didn't made a release for a long time now; (3.1.2 was > released > > on > > >>>> 26 > > >>>>>>>> August 2019) - and I think because we didn't made that many > > branch-3 > > >>>>>>>> releases; not too many fixes > > >>>>>>>> were ported there - which made that release branch kinda erode > > away. > > >>>>>>>> > > >>>>>>>> We have a lot of new features/changes in the current master. > > >>>>>>>> I think instead of aiming for big feature-packed releases we > > should > > >>>> aim > > >>>>>>>> for making a regular release every few months - we should make > > >>>>>>>> regular > > >>>>>>>> releases which people could > > >>>>>>>> install and use. > > >>>>>>>> After all releasing Hive after more than 2 years would be big > step > > >>>>>>> forward > > >>>>>>>> in itself alone - we have so many improvements that I can't even > > >>>>>> count... > > >>>>>>>> > > >>>>>>>> But I may know not every aspects of the project / states of some > > >>>>>> internal > > >>>>>>>> features - so I would like to ask you: > > >>>>>>>> What would be the bare minimum requirements before we could > > release > > >>>> the > > >>>>>>>> current master as Hive X? > > >>>>>>>> > > >>>>>>>> There are many nice-to-have-s like: > > >>>>>>>> * hadoop upgrade > > >>>>>>>> * jdk11 > > >>>>>>>> * remove HoS or MR > > >>>>>>>> * ? > > >>>>>>>> but I don't think these are blockers...we can make any of these > > in > > >>>>>>>> the > > >>>>>>>> next release if we start making them... > > >>>>>>>> > > >>>>>>>> cheers, > > >>>>>>>> Zoltan > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >> > > > > > >