Wow! Awesome. This is the 3rd time I remember seeing green run in >4yrs. :)
Thanks Prasanth > On May 15, 2018, at 5:28 PM, Jesus Camacho Rodriguez <jcama...@apache.org> > wrote: > > We have just had the first clean run in a while: > https://builds.apache.org/job/PreCommit-HIVE-Build/10971/testReport/ > > I will continue monitoring follow-up runs. > > Thanks, > -Jesús > > > On 5/14/18, 11:28 PM, "Prasanth Jayachandran" > <pjayachand...@hortonworks.com> wrote: > > Wondering if we can add a state transition from “Patch Available” to > “Ready To Commit” which can only be triggered by ptest bot on green test run. > > Thanks > Prasanth > > > > On Mon, May 14, 2018 at 10:44 PM -0700, "Jesus Camacho Rodriguez" > <jcama...@apache.org<mailto:jcama...@apache.org>> wrote: > > > I have been working on fixing this situation while commits were still > coming in. > > All the tests that have been disabled are in: > https://issues.apache.org/jira/browse/HIVE-19509 > I have created new issues to reenable each of them, they are linked to > that issue. > Maybe I was slightly aggressive disabling some of the tests, however that > seemed to be the only way to bring the tests failures with age count > 1 to > zero. > > Instead of starting a vote to freeze the commits in another thread, I will > start a vote to be stricter wrt committing to master, i.e., only commit if we > get a clean QA run. > > We can discuss more about this issue over there. > > Thanks, > Jesús > > > > On 5/14/18, 4:11 PM, "Sergey Shelukhin" wrote: > > Can we please make this freeze conditional, i.e. we unfreeze > automatically > after ptest is clean (as evidenced by the clean HiveQA run on a given > JIRA). > > On 18/5/14, 15:16, "Alan Gates" wrote: > >> We should do it in a separate thread so that people can see it with the >> [VOTE] subject. Some people use that as a filter in their email to know >> when to pay attention to things. >> >> Alan. >> >> On Mon, May 14, 2018 at 2:36 PM, Prasanth Jayachandran < >> pjayachand...@hortonworks.com> wrote: >> >>> Will there be a separate voting thread? Or the voting on this thread is >>> sufficient for lock down? >>> >>> Thanks >>> Prasanth >>> >>>> On May 14, 2018, at 2:34 PM, Alan Gates wrote: >>>> >>>> I see there's support for this, but people are still pouring in >>> commits. >>>> I proposed we have a quick vote on this to lock down the commits >>> until we >>>> get to green. That way everyone knows we have drawn the line at a >>> specific >>>> point. Any commits after that point would be reverted. There isn't a >>>> category in the bylaws that fits this kind of vote but I suggest lazy >>>> majority as the most appropriate one (at least 3 votes, more +1s than >>>> -1s). >>>> >>>> Alan. >>>> >>>> On Mon, May 14, 2018 at 10:34 AM, Vihang Karajgaonkar < >>> vih...@cloudera.com> >>>> wrote: >>>> >>>>> I worked on a few quick-fix optimizations in Ptest infrastructure >>> over >>> the >>>>> weekend which reduced the execution run from ~90 min to ~70 min per >>> run. I >>>>> had to restart Ptest multiple times. I was resubmitting the patches >>> which >>>>> were in the queue manually, but I may have missed a few. In case you >>> have a >>>>> patch which is pending pre-commit and you don't see it in the queue, >>> please >>>>> submit it manually or let me know if you don't have access to the >>> jenkins >>>>> job. I will continue to work on the sub-tasks in HIVE-19425 and will >>> do >>>>> some maintenance next weekend as well. >>>>> >>>>> On Mon, May 14, 2018 at 7:42 AM, Jesus Camacho Rodriguez < >>>>> jcama...@apache.org> wrote: >>>>> >>>>>> Vineet has already been working on disabling those tests that were >>> timing >>>>>> out. I am working on disabling those that are generating different q >>>>> files >>>>>> consistently for last ptests n runs. I am keeping track of all these >>>>> tests >>>>>> in https://issues.apache.org/jira/browse/HIVE-19509. >>>>>> >>>>>> -Jesús >>>>>> >>>>>> On 5/14/18, 2:25 AM, "Prasanth Jayachandran" < >>>>>> pjayachand...@hortonworks.com> wrote: >>>>>> >>>>>> +1 on freezing commits until we get repetitive green tests. We >>> should >>>>>> probably disable (and remember in a jira to reenable then at later >>> point) >>>>>> tests that are flaky to get repetitive green test runs. >>>>>> >>>>>> Thanks >>>>>> Prasanth >>>>>> >>>>>> >>>>>> >>>>>> On Mon, May 14, 2018 at 2:15 AM -0700, "Rui Li" < >>>>> lirui.fu...@gmail.com >>>>>>> wrote: >>>>>> >>>>>> >>>>>> +1 to freezing commits until we stabilize >>>>>> >>>>>> On Sat, May 12, 2018 at 6:10 AM, Vihang Karajgaonkar >>>>>> wrote: >>>>>> >>>>>>> In order to understand the end-to-end precommit flow I would like >>>>> to >>>>>> get >>>>>>> access to the PreCommit-HIVE-Build jenkins script. Does anyone one >>>>>> know how >>>>>>> can I get that? >>>>>>> >>>>>>> On Fri, May 11, 2018 at 2:03 PM, Jesus Camacho Rodriguez < >>>>>>> jcama...@apache.org> wrote: >>>>>>> >>>>>>>> Bq. For the short term green runs, I think we should @Ignore the >>>>>> tests >>>>>>>> which >>>>>>>> are known to be failing since many runs. They are anyways not >>>>> being >>>>>>>> addressed as such. If people think they are important to be run >>>>> we >>>>>> should >>>>>>>> fix them and only then re-enable them. >>>>>>>> >>>>>>>> I think that is a good idea, as we would minimize the time that >>>>> we >>>>>> halt >>>>>>>> development. We can create a JIRA where we list all tests that >>>>> were >>>>>>>> failing, and we have disabled to get the clean run. From that >>>>>> moment, we >>>>>>>> will have zero tolerance towards committing with failing tests. >>>>>> And we >>>>>>> need >>>>>>>> to pick up those tests that should not be ignored and bring them >>>>>> up again >>>>>>>> but passing. If there is no disagreement, I can start working on >>>>>> that. >>>>>>>> >>>>>>>> Once I am done, I can try to help with infra tickets too. >>>>>>>> >>>>>>>> -Jesús >>>>>>>> >>>>>>>> >>>>>>>> On 5/11/18, 1:57 PM, "Vineet Garg" wrote: >>>>>>>> >>>>>>>> +1. I strongly vote for freezing commits and getting our >>>>>> testing >>>>>>>> coverage in acceptable state. We have been struggling to >>>>> stabilize >>>>>>>> branch-3 due to test failures and releasing Hive 3.0 in current >>>>>> state >>>>>>> would >>>>>>>> be unacceptable. >>>>>>>> >>>>>>>> Currently there are quite a few test suites which are not >>>>> even >>>>>>> running >>>>>>>> and are being timed out. We have been committing patches (to both >>>>>>> branch-3 >>>>>>>> and master) without test coverage for these tests. >>>>>>>> We should immediately figure out what’s going on before we >>>>>> proceed >>>>>>>> with commits. >>>>>>>> >>>>>>>> For reference following test suites are timing out on >>>>> master: ( >>>>>>>> https://issues.apache.org/jira/browse/HIVE-19506) >>>>>>>> >>>>>>>> >>>>>>>> TestDbNotificationListener - did not produce a TEST-*.xml >>>>> file >>>>>>> (likely >>>>>>>> timed out) >>>>>>>> >>>>>>>> TestHCatHiveCompatibility - did not produce a TEST-*.xml file >>>>>> (likely >>>>>>>> timed out) >>>>>>>> >>>>>>>> TestNegativeCliDriver - did not produce a TEST-*.xml file >>>>>> (likely >>>>>>>> timed out) >>>>>>>> >>>>>>>> TestNonCatCallsWithCatalog - did not produce a TEST-*.xml >>>>> file >>>>>>> (likely >>>>>>>> timed out) >>>>>>>> >>>>>>>> TestSequenceFileReadWrite - did not produce a TEST-*.xml file >>>>>> (likely >>>>>>>> timed out) >>>>>>>> >>>>>>>> TestTxnExIm - did not produce a TEST-*.xml file (likely timed >>>>>> out) >>>>>>>> >>>>>>>> >>>>>>>> Vineet >>>>>>>> >>>>>>>> >>>>>>>> On May 11, 2018, at 1:46 PM, Vihang Karajgaonkar < >>>>>>> vih...@cloudera.com >>>>>>>>> wrote: >>>>>>>> >>>>>>>> +1 There are many problems with the test infrastructure and >>>>> in >>>>>> my >>>>>>>> opinion >>>>>>>> it has not become number one bottleneck for the project. I >>>>> was >>>>>>> looking >>>>>>>> at >>>>>>>> the infrastructure yesterday and I think the current >>>>>> infrastructure >>>>>>>> (even >>>>>>>> its own set of problems) is still under-utilized. I am >>>>>> planning to >>>>>>>> increase >>>>>>>> the number of threads to process the parallel test batches to >>>>>> start >>>>>>>> with. >>>>>>>> It needs a restart on the server side. I can do it now, it >>>>>> folks are >>>>>>>> okay >>>>>>>> with it. Else I can do it over weekend when the queue is >>>>> small. >>>>>>>> >>>>>>>> I listed the improvements which I thought would be useful >>>>> under >>>>>>>> https://issues.apache.org/jira/browse/HIVE-19425 but frankly >>>>>>> speaking >>>>>>>> I am >>>>>>>> not able to devote as much time as I would like to on it. I >>>>>> would >>>>>>>> appreciate if folks who have some more time if they can help >>>>>> out. >>>>>>>> >>>>>>>> I think to start with https://issues.apache.org/ >>>>>>> jira/browse/HIVE-19429 >>>>>>>> will >>>>>>>> help a lot. We need to pack more test runs in parallel and >>>>>> containers >>>>>>>> provide good isolation. >>>>>>>> >>>>>>>> For the short term green runs, I think we should @Ignore the >>>>>> tests >>>>>>>> which >>>>>>>> are known to be failing since many runs. They are anyways not >>>>>> being >>>>>>>> addressed as such. If people think they are important to be >>>>>> run we >>>>>>>> should >>>>>>>> fix them and only then re-enable them. >>>>>>>> >>>>>>>> Also, I feel we need light-weight test run which we can run >>>>>> locally >>>>>>>> before >>>>>>>> submitting it for the full-suite. That way minor issues with >>>>>> the >>>>>>> patch >>>>>>>> can >>>>>>>> be handled locally. May be create a profile which runs a >>>>>> subset of >>>>>>>> important tests which are consistent. We can apply some label >>>>>> that >>>>>>>> pre-checkin-local tests are runs successful and only then we >>>>>> submit >>>>>>>> for the >>>>>>>> full-suite. >>>>>>>> >>>>>>>> More thoughts are welcome. Thanks for starting this >>>>>> conversation. >>>>>>>> >>>>>>>> On Fri, May 11, 2018 at 1:27 PM, Jesus Camacho Rodriguez < >>>>>>>> jcama...@apache.org> wrote: >>>>>>>> >>>>>>>> I believe we have reached a state (maybe we did reach it a >>>>>> while ago) >>>>>>>> that >>>>>>>> is not sustainable anymore, as there are so many tests >>>>> failing >>>>>> / >>>>>>>> timing out >>>>>>>> that it is not possible to verify whether a patch is breaking >>>>>> some >>>>>>>> critical >>>>>>>> parts of the system or not. It also seems to me that due to >>>>> the >>>>>>>> timeouts >>>>>>>> (maybe due to infra, maybe not), ptest runs are taking even >>>>>> longer >>>>>>> than >>>>>>>> usual, which in turn creates even longer queue of patches. >>>>>>>> >>>>>>>> There is an ongoing effort to improve ptests usability ( >>>>>>>> https://issues.apache.org/jira/browse/HIVE-19425), but apart >>>>>> from >>>>>>>> that, >>>>>>>> we need to make an effort to stabilize existing tests and >>>>>> bring that >>>>>>>> failure count to zero. >>>>>>>> >>>>>>>> Hence, I am suggesting *we stop committing any patch before >>>>> we >>>>>> get a >>>>>>>> green >>>>>>>> run*. If someone thinks this proposal is too radical, please >>>>>> come up >>>>>>>> with >>>>>>>> an alternative, because I do not think it is OK to have the >>>>>> ptest >>>>>>> runs >>>>>>>> in >>>>>>>> their current state. Other projects of certain size (e.g., >>>>>> Hadoop, >>>>>>>> Spark) >>>>>>>> are always green, we should be able to do the same. >>>>>>>> >>>>>>>> Finally, once we get to zero failures, I suggest we are less >>>>>> tolerant >>>>>>>> with >>>>>>>> committing without getting a clean ptests run. If there is a >>>>>> failure, >>>>>>>> we >>>>>>>> need to fix it or revert the patch that caused it, then we >>>>>> continue >>>>>>>> developing. >>>>>>>> >>>>>>>> Please, let’s all work together as a community to fix this >>>>>> issue, >>>>>>> that >>>>>>>> is >>>>>>>> the only way to get to zero quickly. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Jesús >>>>>>>> >>>>>>>> PS. I assume the flaky tests will come into the discussion. >>>>>> Let´s see >>>>>>>> first how many of those we have, then we can work to find a >>>>>> fix. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best regards! >>>>>> Rui Li >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>> >>> > > > > > > > >