Will there be a separate voting thread? Or the voting on this thread is sufficient for lock down?
Thanks Prasanth > On May 14, 2018, at 2:34 PM, Alan Gates <alanfga...@gmail.com> wrote: > > I see there's support for this, but people are still pouring in commits. > I proposed we have a quick vote on this to lock down the commits until we > get to green. That way everyone knows we have drawn the line at a specific > point. Any commits after that point would be reverted. There isn't a > category in the bylaws that fits this kind of vote but I suggest lazy > majority as the most appropriate one (at least 3 votes, more +1s than > -1s). > > Alan. > > On Mon, May 14, 2018 at 10:34 AM, Vihang Karajgaonkar <vih...@cloudera.com> > wrote: > >> I worked on a few quick-fix optimizations in Ptest infrastructure over the >> weekend which reduced the execution run from ~90 min to ~70 min per run. I >> had to restart Ptest multiple times. I was resubmitting the patches which >> were in the queue manually, but I may have missed a few. In case you have a >> patch which is pending pre-commit and you don't see it in the queue, please >> submit it manually or let me know if you don't have access to the jenkins >> job. I will continue to work on the sub-tasks in HIVE-19425 and will do >> some maintenance next weekend as well. >> >> On Mon, May 14, 2018 at 7:42 AM, Jesus Camacho Rodriguez < >> jcama...@apache.org> wrote: >> >>> Vineet has already been working on disabling those tests that were timing >>> out. I am working on disabling those that are generating different q >> files >>> consistently for last ptests n runs. I am keeping track of all these >> tests >>> in https://issues.apache.org/jira/browse/HIVE-19509. >>> >>> -Jesús >>> >>> On 5/14/18, 2:25 AM, "Prasanth Jayachandran" < >>> pjayachand...@hortonworks.com> wrote: >>> >>> +1 on freezing commits until we get repetitive green tests. We should >>> probably disable (and remember in a jira to reenable then at later point) >>> tests that are flaky to get repetitive green test runs. >>> >>> Thanks >>> Prasanth >>> >>> >>> >>> On Mon, May 14, 2018 at 2:15 AM -0700, "Rui Li" < >> lirui.fu...@gmail.com >>> <mailto:lirui.fu...@gmail.com>> wrote: >>> >>> >>> +1 to freezing commits until we stabilize >>> >>> On Sat, May 12, 2018 at 6:10 AM, Vihang Karajgaonkar >>> wrote: >>> >>>> In order to understand the end-to-end precommit flow I would like >> to >>> get >>>> access to the PreCommit-HIVE-Build jenkins script. Does anyone one >>> know how >>>> can I get that? >>>> >>>> On Fri, May 11, 2018 at 2:03 PM, Jesus Camacho Rodriguez < >>>> jcama...@apache.org> wrote: >>>> >>>>> Bq. For the short term green runs, I think we should @Ignore the >>> tests >>>>> which >>>>> are known to be failing since many runs. They are anyways not >> being >>>>> addressed as such. If people think they are important to be run >> we >>> should >>>>> fix them and only then re-enable them. >>>>> >>>>> I think that is a good idea, as we would minimize the time that >> we >>> halt >>>>> development. We can create a JIRA where we list all tests that >> were >>>>> failing, and we have disabled to get the clean run. From that >>> moment, we >>>>> will have zero tolerance towards committing with failing tests. >>> And we >>>> need >>>>> to pick up those tests that should not be ignored and bring them >>> up again >>>>> but passing. If there is no disagreement, I can start working on >>> that. >>>>> >>>>> Once I am done, I can try to help with infra tickets too. >>>>> >>>>> -Jesús >>>>> >>>>> >>>>> On 5/11/18, 1:57 PM, "Vineet Garg" wrote: >>>>> >>>>> +1. I strongly vote for freezing commits and getting our >>> testing >>>>> coverage in acceptable state. We have been struggling to >> stabilize >>>>> branch-3 due to test failures and releasing Hive 3.0 in current >>> state >>>> would >>>>> be unacceptable. >>>>> >>>>> Currently there are quite a few test suites which are not >> even >>>> running >>>>> and are being timed out. We have been committing patches (to both >>>> branch-3 >>>>> and master) without test coverage for these tests. >>>>> We should immediately figure out what’s going on before we >>> proceed >>>>> with commits. >>>>> >>>>> For reference following test suites are timing out on >> master: ( >>>>> https://issues.apache.org/jira/browse/HIVE-19506) >>>>> >>>>> >>>>> TestDbNotificationListener - did not produce a TEST-*.xml >> file >>>> (likely >>>>> timed out) >>>>> >>>>> TestHCatHiveCompatibility - did not produce a TEST-*.xml file >>> (likely >>>>> timed out) >>>>> >>>>> TestNegativeCliDriver - did not produce a TEST-*.xml file >>> (likely >>>>> timed out) >>>>> >>>>> TestNonCatCallsWithCatalog - did not produce a TEST-*.xml >> file >>>> (likely >>>>> timed out) >>>>> >>>>> TestSequenceFileReadWrite - did not produce a TEST-*.xml file >>> (likely >>>>> timed out) >>>>> >>>>> TestTxnExIm - did not produce a TEST-*.xml file (likely timed >>> out) >>>>> >>>>> >>>>> Vineet >>>>> >>>>> >>>>> On May 11, 2018, at 1:46 PM, Vihang Karajgaonkar < >>>> vih...@cloudera.com >>>>>> wrote: >>>>> >>>>> +1 There are many problems with the test infrastructure and >> in >>> my >>>>> opinion >>>>> it has not become number one bottleneck for the project. I >> was >>>> looking >>>>> at >>>>> the infrastructure yesterday and I think the current >>> infrastructure >>>>> (even >>>>> its own set of problems) is still under-utilized. I am >>> planning to >>>>> increase >>>>> the number of threads to process the parallel test batches to >>> start >>>>> with. >>>>> It needs a restart on the server side. I can do it now, it >>> folks are >>>>> okay >>>>> with it. Else I can do it over weekend when the queue is >> small. >>>>> >>>>> I listed the improvements which I thought would be useful >> under >>>>> https://issues.apache.org/jira/browse/HIVE-19425 but frankly >>>> speaking >>>>> I am >>>>> not able to devote as much time as I would like to on it. I >>> would >>>>> appreciate if folks who have some more time if they can help >>> out. >>>>> >>>>> I think to start with https://issues.apache.org/ >>>> jira/browse/HIVE-19429 >>>>> will >>>>> help a lot. We need to pack more test runs in parallel and >>> containers >>>>> provide good isolation. >>>>> >>>>> For the short term green runs, I think we should @Ignore the >>> tests >>>>> which >>>>> are known to be failing since many runs. They are anyways not >>> being >>>>> addressed as such. If people think they are important to be >>> run we >>>>> should >>>>> fix them and only then re-enable them. >>>>> >>>>> Also, I feel we need light-weight test run which we can run >>> locally >>>>> before >>>>> submitting it for the full-suite. That way minor issues with >>> the >>>> patch >>>>> can >>>>> be handled locally. May be create a profile which runs a >>> subset of >>>>> important tests which are consistent. We can apply some label >>> that >>>>> pre-checkin-local tests are runs successful and only then we >>> submit >>>>> for the >>>>> full-suite. >>>>> >>>>> More thoughts are welcome. Thanks for starting this >>> conversation. >>>>> >>>>> On Fri, May 11, 2018 at 1:27 PM, Jesus Camacho Rodriguez < >>>>> jcama...@apache.org> wrote: >>>>> >>>>> I believe we have reached a state (maybe we did reach it a >>> while ago) >>>>> that >>>>> is not sustainable anymore, as there are so many tests >> failing >>> / >>>>> timing out >>>>> that it is not possible to verify whether a patch is breaking >>> some >>>>> critical >>>>> parts of the system or not. It also seems to me that due to >> the >>>>> timeouts >>>>> (maybe due to infra, maybe not), ptest runs are taking even >>> longer >>>> than >>>>> usual, which in turn creates even longer queue of patches. >>>>> >>>>> There is an ongoing effort to improve ptests usability ( >>>>> https://issues.apache.org/jira/browse/HIVE-19425), but apart >>> from >>>>> that, >>>>> we need to make an effort to stabilize existing tests and >>> bring that >>>>> failure count to zero. >>>>> >>>>> Hence, I am suggesting *we stop committing any patch before >> we >>> get a >>>>> green >>>>> run*. If someone thinks this proposal is too radical, please >>> come up >>>>> with >>>>> an alternative, because I do not think it is OK to have the >>> ptest >>>> runs >>>>> in >>>>> their current state. Other projects of certain size (e.g., >>> Hadoop, >>>>> Spark) >>>>> are always green, we should be able to do the same. >>>>> >>>>> Finally, once we get to zero failures, I suggest we are less >>> tolerant >>>>> with >>>>> committing without getting a clean ptests run. If there is a >>> failure, >>>>> we >>>>> need to fix it or revert the patch that caused it, then we >>> continue >>>>> developing. >>>>> >>>>> Please, let’s all work together as a community to fix this >>> issue, >>>> that >>>>> is >>>>> the only way to get to zero quickly. >>>>> >>>>> Thanks, >>>>> Jesús >>>>> >>>>> PS. I assume the flaky tests will come into the discussion. >>> Let´s see >>>>> first how many of those we have, then we can work to find a >>> fix. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >>> -- >>> Best regards! >>> Rui Li >>> >>> >>> >>> >>> >>