We should do it in a separate thread so that people can see it with the [VOTE] subject. Some people use that as a filter in their email to know when to pay attention to things.
Alan. On Mon, May 14, 2018 at 2:36 PM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Will there be a separate voting thread? Or the voting on this thread is > sufficient for lock down? > > Thanks > Prasanth > > > On May 14, 2018, at 2:34 PM, Alan Gates <alanfga...@gmail.com> wrote: > > > > I see there's support for this, but people are still pouring in commits. > > I proposed we have a quick vote on this to lock down the commits until we > > get to green. That way everyone knows we have drawn the line at a > specific > > point. Any commits after that point would be reverted. There isn't a > > category in the bylaws that fits this kind of vote but I suggest lazy > > majority as the most appropriate one (at least 3 votes, more +1s than > > -1s). > > > > Alan. > > > > On Mon, May 14, 2018 at 10:34 AM, Vihang Karajgaonkar < > vih...@cloudera.com> > > wrote: > > > >> I worked on a few quick-fix optimizations in Ptest infrastructure over > the > >> weekend which reduced the execution run from ~90 min to ~70 min per > run. I > >> had to restart Ptest multiple times. I was resubmitting the patches > which > >> were in the queue manually, but I may have missed a few. In case you > have a > >> patch which is pending pre-commit and you don't see it in the queue, > please > >> submit it manually or let me know if you don't have access to the > jenkins > >> job. I will continue to work on the sub-tasks in HIVE-19425 and will do > >> some maintenance next weekend as well. > >> > >> On Mon, May 14, 2018 at 7:42 AM, Jesus Camacho Rodriguez < > >> jcama...@apache.org> wrote: > >> > >>> Vineet has already been working on disabling those tests that were > timing > >>> out. I am working on disabling those that are generating different q > >> files > >>> consistently for last ptests n runs. I am keeping track of all these > >> tests > >>> in https://issues.apache.org/jira/browse/HIVE-19509. > >>> > >>> -Jesús > >>> > >>> On 5/14/18, 2:25 AM, "Prasanth Jayachandran" < > >>> pjayachand...@hortonworks.com> wrote: > >>> > >>> +1 on freezing commits until we get repetitive green tests. We > should > >>> probably disable (and remember in a jira to reenable then at later > point) > >>> tests that are flaky to get repetitive green test runs. > >>> > >>> Thanks > >>> Prasanth > >>> > >>> > >>> > >>> On Mon, May 14, 2018 at 2:15 AM -0700, "Rui Li" < > >> lirui.fu...@gmail.com > >>> <mailto:lirui.fu...@gmail.com>> wrote: > >>> > >>> > >>> +1 to freezing commits until we stabilize > >>> > >>> On Sat, May 12, 2018 at 6:10 AM, Vihang Karajgaonkar > >>> wrote: > >>> > >>>> In order to understand the end-to-end precommit flow I would like > >> to > >>> get > >>>> access to the PreCommit-HIVE-Build jenkins script. Does anyone one > >>> know how > >>>> can I get that? > >>>> > >>>> On Fri, May 11, 2018 at 2:03 PM, Jesus Camacho Rodriguez < > >>>> jcama...@apache.org> wrote: > >>>> > >>>>> Bq. For the short term green runs, I think we should @Ignore the > >>> tests > >>>>> which > >>>>> are known to be failing since many runs. They are anyways not > >> being > >>>>> addressed as such. If people think they are important to be run > >> we > >>> should > >>>>> fix them and only then re-enable them. > >>>>> > >>>>> I think that is a good idea, as we would minimize the time that > >> we > >>> halt > >>>>> development. We can create a JIRA where we list all tests that > >> were > >>>>> failing, and we have disabled to get the clean run. From that > >>> moment, we > >>>>> will have zero tolerance towards committing with failing tests. > >>> And we > >>>> need > >>>>> to pick up those tests that should not be ignored and bring them > >>> up again > >>>>> but passing. If there is no disagreement, I can start working on > >>> that. > >>>>> > >>>>> Once I am done, I can try to help with infra tickets too. > >>>>> > >>>>> -Jesús > >>>>> > >>>>> > >>>>> On 5/11/18, 1:57 PM, "Vineet Garg" wrote: > >>>>> > >>>>> +1. I strongly vote for freezing commits and getting our > >>> testing > >>>>> coverage in acceptable state. We have been struggling to > >> stabilize > >>>>> branch-3 due to test failures and releasing Hive 3.0 in current > >>> state > >>>> would > >>>>> be unacceptable. > >>>>> > >>>>> Currently there are quite a few test suites which are not > >> even > >>>> running > >>>>> and are being timed out. We have been committing patches (to both > >>>> branch-3 > >>>>> and master) without test coverage for these tests. > >>>>> We should immediately figure out what’s going on before we > >>> proceed > >>>>> with commits. > >>>>> > >>>>> For reference following test suites are timing out on > >> master: ( > >>>>> https://issues.apache.org/jira/browse/HIVE-19506) > >>>>> > >>>>> > >>>>> TestDbNotificationListener - did not produce a TEST-*.xml > >> file > >>>> (likely > >>>>> timed out) > >>>>> > >>>>> TestHCatHiveCompatibility - did not produce a TEST-*.xml file > >>> (likely > >>>>> timed out) > >>>>> > >>>>> TestNegativeCliDriver - did not produce a TEST-*.xml file > >>> (likely > >>>>> timed out) > >>>>> > >>>>> TestNonCatCallsWithCatalog - did not produce a TEST-*.xml > >> file > >>>> (likely > >>>>> timed out) > >>>>> > >>>>> TestSequenceFileReadWrite - did not produce a TEST-*.xml file > >>> (likely > >>>>> timed out) > >>>>> > >>>>> TestTxnExIm - did not produce a TEST-*.xml file (likely timed > >>> out) > >>>>> > >>>>> > >>>>> Vineet > >>>>> > >>>>> > >>>>> On May 11, 2018, at 1:46 PM, Vihang Karajgaonkar < > >>>> vih...@cloudera.com > >>>>>> wrote: > >>>>> > >>>>> +1 There are many problems with the test infrastructure and > >> in > >>> my > >>>>> opinion > >>>>> it has not become number one bottleneck for the project. I > >> was > >>>> looking > >>>>> at > >>>>> the infrastructure yesterday and I think the current > >>> infrastructure > >>>>> (even > >>>>> its own set of problems) is still under-utilized. I am > >>> planning to > >>>>> increase > >>>>> the number of threads to process the parallel test batches to > >>> start > >>>>> with. > >>>>> It needs a restart on the server side. I can do it now, it > >>> folks are > >>>>> okay > >>>>> with it. Else I can do it over weekend when the queue is > >> small. > >>>>> > >>>>> I listed the improvements which I thought would be useful > >> under > >>>>> https://issues.apache.org/jira/browse/HIVE-19425 but frankly > >>>> speaking > >>>>> I am > >>>>> not able to devote as much time as I would like to on it. I > >>> would > >>>>> appreciate if folks who have some more time if they can help > >>> out. > >>>>> > >>>>> I think to start with https://issues.apache.org/ > >>>> jira/browse/HIVE-19429 > >>>>> will > >>>>> help a lot. We need to pack more test runs in parallel and > >>> containers > >>>>> provide good isolation. > >>>>> > >>>>> For the short term green runs, I think we should @Ignore the > >>> tests > >>>>> which > >>>>> are known to be failing since many runs. They are anyways not > >>> being > >>>>> addressed as such. If people think they are important to be > >>> run we > >>>>> should > >>>>> fix them and only then re-enable them. > >>>>> > >>>>> Also, I feel we need light-weight test run which we can run > >>> locally > >>>>> before > >>>>> submitting it for the full-suite. That way minor issues with > >>> the > >>>> patch > >>>>> can > >>>>> be handled locally. May be create a profile which runs a > >>> subset of > >>>>> important tests which are consistent. We can apply some label > >>> that > >>>>> pre-checkin-local tests are runs successful and only then we > >>> submit > >>>>> for the > >>>>> full-suite. > >>>>> > >>>>> More thoughts are welcome. Thanks for starting this > >>> conversation. > >>>>> > >>>>> On Fri, May 11, 2018 at 1:27 PM, Jesus Camacho Rodriguez < > >>>>> jcama...@apache.org> wrote: > >>>>> > >>>>> I believe we have reached a state (maybe we did reach it a > >>> while ago) > >>>>> that > >>>>> is not sustainable anymore, as there are so many tests > >> failing > >>> / > >>>>> timing out > >>>>> that it is not possible to verify whether a patch is breaking > >>> some > >>>>> critical > >>>>> parts of the system or not. It also seems to me that due to > >> the > >>>>> timeouts > >>>>> (maybe due to infra, maybe not), ptest runs are taking even > >>> longer > >>>> than > >>>>> usual, which in turn creates even longer queue of patches. > >>>>> > >>>>> There is an ongoing effort to improve ptests usability ( > >>>>> https://issues.apache.org/jira/browse/HIVE-19425), but apart > >>> from > >>>>> that, > >>>>> we need to make an effort to stabilize existing tests and > >>> bring that > >>>>> failure count to zero. > >>>>> > >>>>> Hence, I am suggesting *we stop committing any patch before > >> we > >>> get a > >>>>> green > >>>>> run*. If someone thinks this proposal is too radical, please > >>> come up > >>>>> with > >>>>> an alternative, because I do not think it is OK to have the > >>> ptest > >>>> runs > >>>>> in > >>>>> their current state. Other projects of certain size (e.g., > >>> Hadoop, > >>>>> Spark) > >>>>> are always green, we should be able to do the same. > >>>>> > >>>>> Finally, once we get to zero failures, I suggest we are less > >>> tolerant > >>>>> with > >>>>> committing without getting a clean ptests run. If there is a > >>> failure, > >>>>> we > >>>>> need to fix it or revert the patch that caused it, then we > >>> continue > >>>>> developing. > >>>>> > >>>>> Please, let’s all work together as a community to fix this > >>> issue, > >>>> that > >>>>> is > >>>>> the only way to get to zero quickly. > >>>>> > >>>>> Thanks, > >>>>> Jesús > >>>>> > >>>>> PS. I assume the flaky tests will come into the discussion. > >>> Let´s see > >>>>> first how many of those we have, then we can work to find a > >>> fix. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Best regards! > >>> Rui Li > >>> > >>> > >>> > >>> > >>> > >> > >