I personally checked the "soft" errors and fixed them if a second run was needed in my patches (which was the case most often than not ,because of the flaky tests anyway :() Also often checked the output before my reviews as well, and asked contributors to fix them. Running them manually for every patch would be an extra chore I would rather avoid.
So I agree that we should start enforcing the rules, but if there is an easy way to run them, I would like to see yetus runs in the new system until we get to the "promised land" :) and have enforced rules in place. That said, even if we move forward only with the enforced rules we immediately can start with the following checks: - Rat check - Apache licence header - Whitespace check for all non-generated files Just my 2 cents > On Jun 5, 2020, at 11:18, Stamatis Zampetakis <zabe...@gmail.com> wrote: > > +1 for failing fast starting with findbugs and eventually covering the > important points from checkstyle. > > Bes, > Stamatis > > On Fri, Jun 5, 2020 at 9:35 AM Zoltan Haindrich <k...@rxd.hu> wrote: > >> >> Hey Mustafa! >> >> Those checks are not executed anymore in the new system. I always feeled >> it a bit confusing to have a comment which reports about >> checkstyle/finbugs/etc issues; while >> getting a green test run was almost impossible due to the high number of >> randomly falling tests. >> So I don't think it's viable that someone will re-submit the patch with >> style changes. >> >> I think the old approach is a "soft" way of enforcing code quality - in >> which I personally don't believe: code quality should be enforced by >> rules/quality gates/etc. >> >> So I would like to take a different approach...I think we should definetly >> re-introduce these checks - however without "tolerance" being built-in. >> This will most likely >> mean that we will have to soften the ruleset first; but then we may >> gradually increase the bar to a higher level. >> >> The "without tolerance" will mean that this will be checked during (or >> right after) the build phase - so if you make quality mistakes you will >> just not get a test results >> (it will also save resources). >> >> Yesterday Laszlo have opened a ticket about fixing findbugs issues - if we >> do fix those issues; but we never enforce to fail the build - someone might >> just add a few more. >> >> To increase code quality through out the project I think we could take a >> bottom-up approach: >> * first patch: >> * fix things in a low level module(like common or storage-api) >> * it should also add the neccessary maven&job changes to enforce things >> during precommit (up-to that module) >> * followups: >> * raise the bar to higher level modules >> >> Obviously we can't do this for something like checkstyle which detects a >> myriad of small issues: >> * the ruleset should be shrinked to something which needs reasonable >> amount of work to start enforcing >> * later we can enable further rules/fix all of them in the project >> >> What do you think about this? >> >> cheers, >> Zoltan >> >> >> >> On 6/5/20 2:47 AM, Mustafa IMAN wrote: >>> Thank you Zoltan for all this work. >>> I see many PRs are merged based on the new workflow already. The old >>> workflow generates many reports like ASF license/findbugs/checkstyle >> etc. I >>> don't see these in the new Github PR workflow. I am concerned the >> codebase >>> is going to suffer from lack of these reports very quickly. Do these >> checks >>> still happen but are not visible? >>> >>> On Tue, Jun 2, 2020 at 4:41 AM Zoltan Haindrich <k...@rxd.hu> wrote: >>> >>>> Hello, >>>> >>>> I would like to note that you may login to the jenkins instance - to >>>> start/kill builds (or create new jobs). >>>> I've configured github oauth - but since team membership can't be >> queried >>>> from the "apache organization" - it's harder to configure all "hive >>>> committers". >>>> However...I think I've made it available for most of us - if you can't >>>> start builds/etc just let me know your github user and I'll add it. >>>> >>>> cheers, >>>> Zoltan >>>> >>>> >>>> >>>> On 5/29/20 2:32 PM, Zoltan Haindrich wrote: >>>>> Hey all! >>>>> >>>>> The patch is now in master - so every new PR or a push on it will >>>> trigger a new run. >>>>> >>>>> Please decide which one would you like to use - open a PR to see the >> new >>>> one work...or upload a patch file to the jira - but please don't do >> both; >>>> because in that case 2 >>>>> execution will happen. >>>>> >>>>> The job execution time(2-4 hours) of a single run is a bit higher than >>>> the usual on the ptest server - this is mostly to increase throughput. >>>>> >>>>> The patch also disabled a set of tests; I will send the full list of >>>> skipped tests shortly. >>>>> >>>>> cheers, >>>>> Zoltan >>>>> >>>>> >>>>> On 5/27/20 1:50 PM, Zoltan Haindrich wrote: >>>>>> Hello all! >>>>>> >>>>>> The new stuff is ready to be switched on-to. It needs to be merged >> into >>>> master - and after that anyone who opens a PR will get a run by the new >>>> HiveQA infra. >>>>>> I propose to run the 2 systems side-by-side for some time - the >> regular >>>> master builds will start; and we will see how frequently that is >> polluted >>>> by flaky tests. >>>>>> >>>>>> Note that the current patch also disables around ~25 more tests to >>>> increase stability - to get a better overview about the disabled tests I >>>> think the "direction of the >>>>>> information flow" should be altered; what I mean by that is: instead >> of >>>> just throwing in a jira for "disable test x" and opening a new one like >>>> "fix test x"; only open >>>>>> the latter and place the jira reference into the ignore message; >>>> meanwhile also add a regular report about the actually disabled tests - >> so >>>> people who do know about the >>>>>> importance of a particular test can get involved. >>>>>> >>>>>> Note: the builds.apache.org instance will be shutdown somewhere in >> the >>>> future as well...but I think the new one is a good-enough alternative to >>>> not have to migrate the >>>>>> Hive-precommit job over to https://ci-hadoop.apache.org/. >>>>>> >>>>>> http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/ >>>>>> https://issues.apache.org/jira/browse/HIVE-22942 >>>>>> https://github.com/apache/hive/pull/948/files >>>>>> >>>>>> cheers, >>>>>> Zoltan >>>>>> >>>>>> On 5/18/20 1:42 PM, Zoltan Haindrich wrote: >>>>>>> Hey! >>>>>>> >>>>>>> On 5/18/20 11:51 AM, Zoltan Chovan wrote: >>>>>>>> Thank you for all of your efforts, this looks really promising. With >>>> moving >>>>>>>> to github PRs, would that also mean that we move away from the >>>> reviewboard >>>>>>>> for code review? >>>>>>> I didn't thinked about that. I think using github's review interface >>>> will remain optional, because both review systems has there own strong >>>> points - I wouldn't force >>>>>>> anyone to use one over the other. (For some patches reviewboard is >>>> much better; because it's able to track content moves a bit better than >>>> github. - meanwhile github has >>>>>>> a small feature that enables to mark files as reviewed) >>>>>>> As a matter of fact we had sometimes patches on the jira's which >> never >>>> had neither an RB or a PR to review them - having a PR there at least >> will >>>> make it easier for >>>>>>> reviewers to comment. >>>>>>> >>>>>>>> Also, what happens if a PR is updated? Will the tests run for both >> or >>>> just >>>>>>>> for the latest version? >>>>>>> It will trigger a new build - if there is already a build in progress >>>> that will prevent a new build from starting until it finishes...and >> there >>>> is also a 5 builds/day >>>>>>> limit; which might induce some wait. >>>>>>> >>>>>>> cheers, >>>>>>> Zoltan >>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Zoltan >>>>>>>> >>>>>>>> On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <k...@rxd.hu> >>>> wrote: >>>>>>>> >>>>>>>>> Hello all! >>>>>>>>> >>>>>>>>> The proposed system have become more stable lately - and I think >> I've >>>>>>>>> solved a few sources of flakiness. >>>>>>>>> To be really usable I also wanted to add a way to dynamically >>>>>>>>> enable/disable a set of tests (for example the replication tests >>>> take ~7 >>>>>>>>> hours to execute from the total of 24 >>>>>>>>> hours - and they are also a bit unstable, so not running them when >>>> not >>>>>>>>> neccesary would be beneficial in multiple ways) - but to do this >> the >>>> best >>>>>>>>> would be to throw in >>>>>>>>> junit5; unfortunately the current ptest installation uses maven >> 3.0.5 >>>>>>>>> which doesn't like these kind of things - so instead of hacking a >>>> fix for >>>>>>>>> that ....I've removed it >>>>>>>>> from the dev branch for now. >>>>>>>>> >>>>>>>>> I would like to propose to start an evaluation phase of the new >> test >>>>>>>>> procedures(INFRA-20269) >>>>>>>>> The process would look something like this: >>>>>>>>> * someone opens a PR - the tests will be run on the changes >>>>>>>>> * on every active branches the tests will run from time to time >>>>>>>>> * this will produce a bunch of test runs on the master branch >> as >>>> well ; >>>>>>>>> which will show how well the tests behave on the master branch >>>> without any >>>>>>>>> patches >>>>>>>>> * runs on branches (PRs or active development branches(eg:master)) >>>> will be >>>>>>>>> rate limited to 5 builds/day >>>>>>>>> * at most ~4 builds at a time - to maximize resource usage >>>>>>>>> * turnaround time for a build is right now 2 hours - which I feel >>>> like a >>>>>>>>> balanced choice between speed/response time >>>>>>>>> >>>>>>>>> Possible future benefits: >>>>>>>>> * toggle features using github tags >>>>>>>>> * optional testgroups (metastore/replication) tests >>>>>>>>> * ability to run the metastore verification tests >>>>>>>>> * possibility to add smoke tests >>>>>>>>> >>>>>>>>> To enable this I will have to finish the HIVE-22942 ticket - beyond >>>> the >>>>>>>>> new Jenkinsfile which defines the full logic; >>>>>>>>> although I've sinked a lot of time into fixing all kind of flaky >>>> tests I >>>>>>>>> would would like to disable around ~25 tests. >>>>>>>>> >>>>>>>>> I also would like to propose a method to verify the stability of a >>>> single >>>>>>>>> test: run it a 100 times in series at the same place where the >>>> precommit >>>>>>>>> tests are running. >>>>>>>>> This will put the bar high enough that only totally stable tests >>>> could >>>>>>>>> satisfy it (a 99% stable test has 36% chance to pass this without >>>> being >>>>>>>>> caught :D) >>>>>>>>> After this will be in service it could be used to: validate that an >>>>>>>>> existing test is unstable (before disabling it) - and then used >>>> again to >>>>>>>>> prove that it got fixed during >>>>>>>>> re-enabling it. >>>>>>>>> >>>>>>>>> Please let me know what you think! >>>>>>>>> >>>>>>>>> cheers, >>>>>>>>> Zoltan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 4/29/20 4:28 PM, Zoltan Haindrich wrote: >>>>>>>>>> Hey All! >>>>>>>>>> >>>>>>>>>> I was planning to replace the ptest stuff with something less >>>> complex >>>>>>>>> for a while now - I see that we struggle a lot because of ptest is >>>> more >>>>>>>>> complicated than it should be... >>>>>>>>>> It would be much better if it would be constructed from well made >>>>>>>>> existing CI piece. - because of that I've started working on [1] a >>>> few >>>>>>>>> months ago. >>>>>>>>>> >>>>>>>>>> It has it's pros and cons...but it's not the same as the existing >>>> ptest >>>>>>>>> stuff. >>>>>>>>>> I've collected some infos about how it compares against the >>>> existing one >>>>>>>>> - but it became too long so I've moved it into a google docs >>>> document at >>>>>>>>> [3]. >>>>>>>>>> >>>>>>>>>> It's not yet ready... I still have some remaining >>>> problems/concerns/etc >>>>>>>>>> * what do you think about changing to a github PR based workflow? >>>>>>>>>> * it will not support at all things like "isolation" - so we will >>>> have >>>>>>>>> to make our tests work with eachother without bending the rules... >>>>>>>>>> * I've tried to overcommit the cpu resources which creates a more >>>> noisy >>>>>>>>> environment for the actual tests - this squeezes out some new >>>> problems >>>>>>>>> which should be fixed before >>>>>>>>>> this could be enabled. >>>>>>>>>> * for every PR the first run is somewhat sub-optimal...there are >>>> some >>>>>>>>> reasons for this - the actually used resources are the same; but >> the >>>>>>>>> overall execution time is not >>>>>>>>>> optimal; I could accept this as a compromise because right now I >>>> wait >>>>>>>>>> 24 hours for a precommit run. >>>>>>>>>> >>>>>>>>>> It's deployed at [2] and anyone can start a testrun on it: >>>>>>>>>> * merge my HIVE-22942-ptest-alt branch from [4] into your branch >>>>>>>>>> * open a PR against my hive repo on github [5] >>>>>>>>>> >>>>>>>>>> cheers, >>>>>>>>>> Zoltan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1] https://issues.apache.org/jira/browse/HIVE-22942 >>>>>>>>>> [2] http://34.66.156.144:8080/job/hive-precommit >>>>>>>>>> [3] >>>>>>>>> >>>> >> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing >>>>>>>>>> [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt >>>>>>>>>> [5] https://github.com/kgyrtkirk/hive/ >>>>>>>>> >>>>>>>> >>>> >>> >>