Very nice. There was an effort to get fast and green builds back in 2016. There wasn't any strict "must be a green build" before commit at the time though. Instead jiras were filed and the expectation was that they'd be cited / new ones created pre commit(looking at the jiras now - this was likely followed for a while, many fixes, and eventually got annoying?). Think the enforcement step is absolutely required to get to, and maintain a green build. May want to consider performance characteristics of tests as well - must complete with X seconds. Jiras for reference (including test infra improvements which were not done at the time): HIVE-13503, HIVE-15058, HIVE-14547
This will be painful initially, but eventually it'll be great to be able to commit without having to scan through a bunch of 'known failures', analyze, document etc. On Tue, May 15, 2018 at 5:30 PM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Wow! Awesome. This is the 3rd time I remember seeing green run in >4yrs. :) > > Thanks > Prasanth > > > On May 15, 2018, at 5:28 PM, Jesus Camacho Rodriguez < > jcama...@apache.org> wrote: > > > > We have just had the first clean run in a while: > > https://builds.apache.org/job/PreCommit-HIVE-Build/10971/testReport/ > > > > I will continue monitoring follow-up runs. > > > > Thanks, > > -Jesús > > > > > > On 5/14/18, 11:28 PM, "Prasanth Jayachandran" < > pjayachand...@hortonworks.com> wrote: > > > > Wondering if we can add a state transition from “Patch Available” to > “Ready To Commit” which can only be triggered by ptest bot on green test > run. > > > > Thanks > > Prasanth > > > > > > > > On Mon, May 14, 2018 at 10:44 PM -0700, "Jesus Camacho Rodriguez" < > jcama...@apache.org<mailto:jcama...@apache.org>> wrote: > > > > > > I have been working on fixing this situation while commits were still > coming in. > > > > All the tests that have been disabled are in: > > https://issues.apache.org/jira/browse/HIVE-19509 > > I have created new issues to reenable each of them, they are linked > to that issue. > > Maybe I was slightly aggressive disabling some of the tests, however > that seemed to be the only way to bring the tests failures with age count > > 1 to zero. > > > > Instead of starting a vote to freeze the commits in another thread, I > will start a vote to be stricter wrt committing to master, i.e., only > commit if we get a clean QA run. > > > > We can discuss more about this issue over there. > > > > Thanks, > > Jesús > > > > > > > > On 5/14/18, 4:11 PM, "Sergey Shelukhin" wrote: > > > > Can we please make this freeze conditional, i.e. we unfreeze > automatically > > after ptest is clean (as evidenced by the clean HiveQA run on a > given > > JIRA). > > > > On 18/5/14, 15:16, "Alan Gates" wrote: > > > >> We should do it in a separate thread so that people can see it with the > >> [VOTE] subject. Some people use that as a filter in their email to know > >> when to pay attention to things. > >> > >> Alan. > >> > >> On Mon, May 14, 2018 at 2:36 PM, Prasanth Jayachandran < > >> pjayachand...@hortonworks.com> wrote: > >> > >>> Will there be a separate voting thread? Or the voting on this thread is > >>> sufficient for lock down? > >>> > >>> Thanks > >>> Prasanth > >>> > >>>> On May 14, 2018, at 2:34 PM, Alan Gates wrote: > >>>> > >>>> I see there's support for this, but people are still pouring in > >>> commits. > >>>> I proposed we have a quick vote on this to lock down the commits > >>> until we > >>>> get to green. That way everyone knows we have drawn the line at a > >>> specific > >>>> point. Any commits after that point would be reverted. There isn't a > >>>> category in the bylaws that fits this kind of vote but I suggest lazy > >>>> majority as the most appropriate one (at least 3 votes, more +1s than > >>>> -1s). > >>>> > >>>> Alan. > >>>> > >>>> On Mon, May 14, 2018 at 10:34 AM, Vihang Karajgaonkar < > >>> vih...@cloudera.com> > >>>> wrote: > >>>> > >>>>> I worked on a few quick-fix optimizations in Ptest infrastructure > >>> over > >>> the > >>>>> weekend which reduced the execution run from ~90 min to ~70 min per > >>> run. I > >>>>> had to restart Ptest multiple times. I was resubmitting the patches > >>> which > >>>>> were in the queue manually, but I may have missed a few. In case you > >>> have a > >>>>> patch which is pending pre-commit and you don't see it in the queue, > >>> please > >>>>> submit it manually or let me know if you don't have access to the > >>> jenkins > >>>>> job. I will continue to work on the sub-tasks in HIVE-19425 and will > >>> do > >>>>> some maintenance next weekend as well. > >>>>> > >>>>> On Mon, May 14, 2018 at 7:42 AM, Jesus Camacho Rodriguez < > >>>>> jcama...@apache.org> wrote: > >>>>> > >>>>>> Vineet has already been working on disabling those tests that were > >>> timing > >>>>>> out. I am working on disabling those that are generating different q > >>>>> files > >>>>>> consistently for last ptests n runs. I am keeping track of all these > >>>>> tests > >>>>>> in https://issues.apache.org/jira/browse/HIVE-19509. > >>>>>> > >>>>>> -Jesús > >>>>>> > >>>>>> On 5/14/18, 2:25 AM, "Prasanth Jayachandran" < > >>>>>> pjayachand...@hortonworks.com> wrote: > >>>>>> > >>>>>> +1 on freezing commits until we get repetitive green tests. We > >>> should > >>>>>> probably disable (and remember in a jira to reenable then at later > >>> point) > >>>>>> tests that are flaky to get repetitive green test runs. > >>>>>> > >>>>>> Thanks > >>>>>> Prasanth > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Mon, May 14, 2018 at 2:15 AM -0700, "Rui Li" < > >>>>> lirui.fu...@gmail.com > >>>>>>> wrote: > >>>>>> > >>>>>> > >>>>>> +1 to freezing commits until we stabilize > >>>>>> > >>>>>> On Sat, May 12, 2018 at 6:10 AM, Vihang Karajgaonkar > >>>>>> wrote: > >>>>>> > >>>>>>> In order to understand the end-to-end precommit flow I would like > >>>>> to > >>>>>> get > >>>>>>> access to the PreCommit-HIVE-Build jenkins script. Does anyone one > >>>>>> know how > >>>>>>> can I get that? > >>>>>>> > >>>>>>> On Fri, May 11, 2018 at 2:03 PM, Jesus Camacho Rodriguez < > >>>>>>> jcama...@apache.org> wrote: > >>>>>>> > >>>>>>>> Bq. For the short term green runs, I think we should @Ignore the > >>>>>> tests > >>>>>>>> which > >>>>>>>> are known to be failing since many runs. They are anyways not > >>>>> being > >>>>>>>> addressed as such. If people think they are important to be run > >>>>> we > >>>>>> should > >>>>>>>> fix them and only then re-enable them. > >>>>>>>> > >>>>>>>> I think that is a good idea, as we would minimize the time that > >>>>> we > >>>>>> halt > >>>>>>>> development. We can create a JIRA where we list all tests that > >>>>> were > >>>>>>>> failing, and we have disabled to get the clean run. From that > >>>>>> moment, we > >>>>>>>> will have zero tolerance towards committing with failing tests. > >>>>>> And we > >>>>>>> need > >>>>>>>> to pick up those tests that should not be ignored and bring them > >>>>>> up again > >>>>>>>> but passing. If there is no disagreement, I can start working on > >>>>>> that. > >>>>>>>> > >>>>>>>> Once I am done, I can try to help with infra tickets too. > >>>>>>>> > >>>>>>>> -Jesús > >>>>>>>> > >>>>>>>> > >>>>>>>> On 5/11/18, 1:57 PM, "Vineet Garg" wrote: > >>>>>>>> > >>>>>>>> +1. I strongly vote for freezing commits and getting our > >>>>>> testing > >>>>>>>> coverage in acceptable state. We have been struggling to > >>>>> stabilize > >>>>>>>> branch-3 due to test failures and releasing Hive 3.0 in current > >>>>>> state > >>>>>>> would > >>>>>>>> be unacceptable. > >>>>>>>> > >>>>>>>> Currently there are quite a few test suites which are not > >>>>> even > >>>>>>> running > >>>>>>>> and are being timed out. We have been committing patches (to both > >>>>>>> branch-3 > >>>>>>>> and master) without test coverage for these tests. > >>>>>>>> We should immediately figure out what’s going on before we > >>>>>> proceed > >>>>>>>> with commits. > >>>>>>>> > >>>>>>>> For reference following test suites are timing out on > >>>>> master: ( > >>>>>>>> https://issues.apache.org/jira/browse/HIVE-19506) > >>>>>>>> > >>>>>>>> > >>>>>>>> TestDbNotificationListener - did not produce a TEST-*.xml > >>>>> file > >>>>>>> (likely > >>>>>>>> timed out) > >>>>>>>> > >>>>>>>> TestHCatHiveCompatibility - did not produce a TEST-*.xml file > >>>>>> (likely > >>>>>>>> timed out) > >>>>>>>> > >>>>>>>> TestNegativeCliDriver - did not produce a TEST-*.xml file > >>>>>> (likely > >>>>>>>> timed out) > >>>>>>>> > >>>>>>>> TestNonCatCallsWithCatalog - did not produce a TEST-*.xml > >>>>> file > >>>>>>> (likely > >>>>>>>> timed out) > >>>>>>>> > >>>>>>>> TestSequenceFileReadWrite - did not produce a TEST-*.xml file > >>>>>> (likely > >>>>>>>> timed out) > >>>>>>>> > >>>>>>>> TestTxnExIm - did not produce a TEST-*.xml file (likely timed > >>>>>> out) > >>>>>>>> > >>>>>>>> > >>>>>>>> Vineet > >>>>>>>> > >>>>>>>> > >>>>>>>> On May 11, 2018, at 1:46 PM, Vihang Karajgaonkar < > >>>>>>> vih...@cloudera.com > >>>>>>>>> wrote: > >>>>>>>> > >>>>>>>> +1 There are many problems with the test infrastructure and > >>>>> in > >>>>>> my > >>>>>>>> opinion > >>>>>>>> it has not become number one bottleneck for the project. I > >>>>> was > >>>>>>> looking > >>>>>>>> at > >>>>>>>> the infrastructure yesterday and I think the current > >>>>>> infrastructure > >>>>>>>> (even > >>>>>>>> its own set of problems) is still under-utilized. I am > >>>>>> planning to > >>>>>>>> increase > >>>>>>>> the number of threads to process the parallel test batches to > >>>>>> start > >>>>>>>> with. > >>>>>>>> It needs a restart on the server side. I can do it now, it > >>>>>> folks are > >>>>>>>> okay > >>>>>>>> with it. Else I can do it over weekend when the queue is > >>>>> small. > >>>>>>>> > >>>>>>>> I listed the improvements which I thought would be useful > >>>>> under > >>>>>>>> https://issues.apache.org/jira/browse/HIVE-19425 but frankly > >>>>>>> speaking > >>>>>>>> I am > >>>>>>>> not able to devote as much time as I would like to on it. I > >>>>>> would > >>>>>>>> appreciate if folks who have some more time if they can help > >>>>>> out. > >>>>>>>> > >>>>>>>> I think to start with https://issues.apache.org/ > >>>>>>> jira/browse/HIVE-19429 > >>>>>>>> will > >>>>>>>> help a lot. We need to pack more test runs in parallel and > >>>>>> containers > >>>>>>>> provide good isolation. > >>>>>>>> > >>>>>>>> For the short term green runs, I think we should @Ignore the > >>>>>> tests > >>>>>>>> which > >>>>>>>> are known to be failing since many runs. They are anyways not > >>>>>> being > >>>>>>>> addressed as such. If people think they are important to be > >>>>>> run we > >>>>>>>> should > >>>>>>>> fix them and only then re-enable them. > >>>>>>>> > >>>>>>>> Also, I feel we need light-weight test run which we can run > >>>>>> locally > >>>>>>>> before > >>>>>>>> submitting it for the full-suite. That way minor issues with > >>>>>> the > >>>>>>> patch > >>>>>>>> can > >>>>>>>> be handled locally. May be create a profile which runs a > >>>>>> subset of > >>>>>>>> important tests which are consistent. We can apply some label > >>>>>> that > >>>>>>>> pre-checkin-local tests are runs successful and only then we > >>>>>> submit > >>>>>>>> for the > >>>>>>>> full-suite. > >>>>>>>> > >>>>>>>> More thoughts are welcome. Thanks for starting this > >>>>>> conversation. > >>>>>>>> > >>>>>>>> On Fri, May 11, 2018 at 1:27 PM, Jesus Camacho Rodriguez < > >>>>>>>> jcama...@apache.org> wrote: > >>>>>>>> > >>>>>>>> I believe we have reached a state (maybe we did reach it a > >>>>>> while ago) > >>>>>>>> that > >>>>>>>> is not sustainable anymore, as there are so many tests > >>>>> failing > >>>>>> / > >>>>>>>> timing out > >>>>>>>> that it is not possible to verify whether a patch is breaking > >>>>>> some > >>>>>>>> critical > >>>>>>>> parts of the system or not. It also seems to me that due to > >>>>> the > >>>>>>>> timeouts > >>>>>>>> (maybe due to infra, maybe not), ptest runs are taking even > >>>>>> longer > >>>>>>> than > >>>>>>>> usual, which in turn creates even longer queue of patches. > >>>>>>>> > >>>>>>>> There is an ongoing effort to improve ptests usability ( > >>>>>>>> https://issues.apache.org/jira/browse/HIVE-19425), but apart > >>>>>> from > >>>>>>>> that, > >>>>>>>> we need to make an effort to stabilize existing tests and > >>>>>> bring that > >>>>>>>> failure count to zero. > >>>>>>>> > >>>>>>>> Hence, I am suggesting *we stop committing any patch before > >>>>> we > >>>>>> get a > >>>>>>>> green > >>>>>>>> run*. If someone thinks this proposal is too radical, please > >>>>>> come up > >>>>>>>> with > >>>>>>>> an alternative, because I do not think it is OK to have the > >>>>>> ptest > >>>>>>> runs > >>>>>>>> in > >>>>>>>> their current state. Other projects of certain size (e.g., > >>>>>> Hadoop, > >>>>>>>> Spark) > >>>>>>>> are always green, we should be able to do the same. > >>>>>>>> > >>>>>>>> Finally, once we get to zero failures, I suggest we are less > >>>>>> tolerant > >>>>>>>> with > >>>>>>>> committing without getting a clean ptests run. If there is a > >>>>>> failure, > >>>>>>>> we > >>>>>>>> need to fix it or revert the patch that caused it, then we > >>>>>> continue > >>>>>>>> developing. > >>>>>>>> > >>>>>>>> Please, let’s all work together as a community to fix this > >>>>>> issue, > >>>>>>> that > >>>>>>>> is > >>>>>>>> the only way to get to zero quickly. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Jesús > >>>>>>>> > >>>>>>>> PS. I assume the flaky tests will come into the discussion. > >>>>>> Let´s see > >>>>>>>> first how many of those we have, then we can work to find a > >>>>>> fix. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Best regards! > >>>>>> Rui Li > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>> > >>> > > > > > > > > > > > > > > > > > >