Re: [DISCUSS] Replace ptest with hive-test-kube

Peter Vary Fri, 05 Jun 2020 02:59:40 -0700

I personally checked the "soft" errors and fixed them if a second run was 
needed in my patches (which was the case most often than not ,because of the 
flaky tests anyway :()
Also often checked the output before my reviews as well, and asked contributors 
to fix them. Running them manually for every patch would be an extra chore I 
would rather avoid.


So I agree that we should start enforcing the rules, but if there is an easy 
way to run them, I would like to see yetus runs in the new system until we get 
to the "promised land" :) and have enforced rules in place.

That said, even if we move forward only with the enforced rules we immediately 
can start with the following checks:
- Rat check - Apache licence header
- Whitespace check for all non-generated files

Just my 2 cents

> On Jun 5, 2020, at 11:18, Stamatis Zampetakis <zabe...@gmail.com> wrote:
> 
> +1 for failing fast starting with findbugs and eventually covering the
> important points from checkstyle.
> 
> Bes,
> Stamatis
> 
> On Fri, Jun 5, 2020 at 9:35 AM Zoltan Haindrich <k...@rxd.hu> wrote:
> 
>> 
>> Hey Mustafa!
>> 
>> Those checks are not executed anymore in the new system. I always feeled
>> it a bit confusing to have a comment which reports about
>> checkstyle/finbugs/etc issues; while
>> getting a green test run was almost impossible due to the high number of
>> randomly falling tests.
>> So I don't think it's viable that someone will re-submit the patch with
>> style changes.
>> 
>> I think the old approach is a "soft" way of enforcing code quality - in
>> which I personally don't believe: code quality should be enforced by
>> rules/quality gates/etc.
>> 
>> So I would like to take a different approach...I think we should definetly
>> re-introduce these checks - however without "tolerance" being built-in.
>> This will most likely
>> mean that we will have to soften the ruleset first; but then we may
>> gradually increase the bar to a higher level.
>> 
>> The "without tolerance" will mean that this will be checked during (or
>> right after) the build phase - so if you make quality mistakes you will
>> just not get a test results
>> (it will also save resources).
>> 
>> Yesterday Laszlo have opened a ticket about fixing findbugs issues - if we
>> do fix those issues; but we never enforce to fail the build - someone might
>> just add a few more.
>> 
>> To increase code quality through out the project I think we could take a
>> bottom-up approach:
>> * first patch:
>>   * fix things in a low level module(like common or storage-api)
>>   * it should also add the neccessary maven&job changes to enforce things
>> during precommit (up-to that module)
>> * followups:
>>   * raise the bar to higher level modules
>> 
>> Obviously we can't do this for something like checkstyle which detects a
>> myriad of small issues:
>> * the ruleset should be shrinked to something which needs reasonable
>> amount of work to start enforcing
>> * later we can enable further rules/fix all of them in the project
>> 
>> What do you think about this?
>> 
>> cheers,
>> Zoltan
>> 
>> 
>> 
>> On 6/5/20 2:47 AM, Mustafa IMAN wrote:
>>> Thank you Zoltan for all this work.
>>> I see many PRs are merged based on the new workflow already. The old
>>> workflow generates many reports like ASF license/findbugs/checkstyle
>> etc. I
>>> don't see these in the new Github PR workflow. I am concerned the
>> codebase
>>> is going to suffer from lack of these reports very quickly. Do these
>> checks
>>> still happen but are not visible?
>>> 
>>> On Tue, Jun 2, 2020 at 4:41 AM Zoltan Haindrich <k...@rxd.hu> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I would like to note that you may login to the jenkins instance - to
>>>> start/kill builds (or create new jobs).
>>>> I've configured github oauth - but since team membership can't be
>> queried
>>>> from the "apache organization" - it's harder to configure all "hive
>>>> committers".
>>>> However...I think I've made it available for most of us - if you can't
>>>> start builds/etc just let me know your github user and I'll add it.
>>>> 
>>>> cheers,
>>>> Zoltan
>>>> 
>>>> 
>>>> 
>>>> On 5/29/20 2:32 PM, Zoltan Haindrich wrote:
>>>>> Hey all!
>>>>> 
>>>>> The patch is now in master - so every new PR or a push on it will
>>>> trigger a new run.
>>>>> 
>>>>> Please decide which one would you like to use - open a PR to see the
>> new
>>>> one work...or upload a patch file to the jira - but please don't do
>> both;
>>>> because in that case 2
>>>>> execution will happen.
>>>>> 
>>>>> The job execution time(2-4 hours) of a single run is a bit higher than
>>>> the usual on the ptest server - this is mostly to increase throughput.
>>>>> 
>>>>> The patch also disabled a set of tests; I will send the full list of
>>>> skipped tests shortly.
>>>>> 
>>>>> cheers,
>>>>> Zoltan
>>>>> 
>>>>> 
>>>>> On 5/27/20 1:50 PM, Zoltan Haindrich wrote:
>>>>>> Hello all!
>>>>>> 
>>>>>> The new stuff is ready to be switched on-to. It needs to be merged
>> into
>>>> master - and after that anyone who opens a PR will get a run by the new
>>>> HiveQA infra.
>>>>>> I propose to run the 2 systems side-by-side for some time - the
>> regular
>>>> master builds will start; and we will see how frequently that is
>> polluted
>>>> by flaky tests.
>>>>>> 
>>>>>> Note that the current patch also disables around ~25 more tests to
>>>> increase stability - to get a better overview about the disabled tests I
>>>> think the "direction of the
>>>>>> information flow" should be altered; what I mean by that is: instead
>> of
>>>> just throwing in a jira for "disable test x" and opening a new one like
>>>> "fix test x"; only open
>>>>>> the latter and place the jira reference into the ignore message;
>>>> meanwhile also add a regular report about the actually disabled tests -
>> so
>>>> people who do know about the
>>>>>> importance of a particular test can get involved.
>>>>>> 
>>>>>> Note: the builds.apache.org instance will be shutdown somewhere in
>> the
>>>> future as well...but I think the new one is a good-enough alternative to
>>>> not have to migrate the
>>>>>> Hive-precommit job over to https://ci-hadoop.apache.org/.
>>>>>> 
>>>>>> http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/
>>>>>> https://issues.apache.org/jira/browse/HIVE-22942
>>>>>> https://github.com/apache/hive/pull/948/files
>>>>>> 
>>>>>> cheers,
>>>>>> Zoltan
>>>>>> 
>>>>>> On 5/18/20 1:42 PM, Zoltan Haindrich wrote:
>>>>>>> Hey!
>>>>>>> 
>>>>>>> On 5/18/20 11:51 AM, Zoltan Chovan wrote:
>>>>>>>> Thank you for all of your efforts, this looks really promising. With
>>>> moving
>>>>>>>> to github PRs, would that also mean that we move away from the
>>>> reviewboard
>>>>>>>> for code review?
>>>>>>> I didn't thinked about that. I think using github's review interface
>>>> will remain optional, because both review systems has there own strong
>>>> points - I wouldn't force
>>>>>>> anyone to use one over the other. (For some patches reviewboard is
>>>> much better; because it's able to track content moves a bit better than
>>>> github. - meanwhile github has
>>>>>>> a small feature that enables to mark files as reviewed)
>>>>>>> As a matter of fact we had sometimes patches on the jira's which
>> never
>>>> had neither an RB or a PR to review them - having a PR there at least
>> will
>>>> make it easier for
>>>>>>> reviewers to comment.
>>>>>>> 
>>>>>>>> Also, what happens if a PR is updated? Will the tests run for both
>> or
>>>> just
>>>>>>>> for the latest version?
>>>>>>> It will trigger a new build - if there is already a build in progress
>>>> that will prevent a new build from starting until it finishes...and
>> there
>>>> is also a 5 builds/day
>>>>>>> limit; which might induce some wait.
>>>>>>> 
>>>>>>> cheers,
>>>>>>> Zoltan
>>>>>>> 
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Zoltan
>>>>>>>> 
>>>>>>>> On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <k...@rxd.hu>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hello all!
>>>>>>>>> 
>>>>>>>>> The proposed system have become more stable lately - and I think
>> I've
>>>>>>>>> solved a few sources of flakiness.
>>>>>>>>> To be really usable I also wanted to add a way to dynamically
>>>>>>>>> enable/disable a set of tests (for example the replication tests
>>>> take ~7
>>>>>>>>> hours to execute from the total of 24
>>>>>>>>> hours - and they are also a bit unstable, so not running them when
>>>> not
>>>>>>>>> neccesary would be beneficial in multiple ways) - but to do this
>> the
>>>> best
>>>>>>>>> would be to throw in
>>>>>>>>> junit5; unfortunately the current ptest installation uses maven
>> 3.0.5
>>>>>>>>> which doesn't like these kind of things - so instead of hacking a
>>>> fix for
>>>>>>>>> that ....I've removed it
>>>>>>>>> from the dev branch for now.
>>>>>>>>> 
>>>>>>>>> I would like to propose to start an evaluation phase of the new
>> test
>>>>>>>>> procedures(INFRA-20269)
>>>>>>>>> The process would look something like this:
>>>>>>>>> * someone opens a PR - the tests will be run on the changes
>>>>>>>>> * on every active branches the tests will run from time to time
>>>>>>>>>     * this will produce a bunch of test runs on the master branch
>> as
>>>> well ;
>>>>>>>>> which will show how well the tests behave on the master branch
>>>> without any
>>>>>>>>> patches
>>>>>>>>> * runs on branches (PRs or active development branches(eg:master))
>>>> will be
>>>>>>>>> rate limited to 5 builds/day
>>>>>>>>> * at most ~4 builds at a time - to maximize resource usage
>>>>>>>>> * turnaround time for a build is right now 2 hours - which I feel
>>>> like a
>>>>>>>>> balanced choice between speed/response time
>>>>>>>>> 
>>>>>>>>> Possible future benefits:
>>>>>>>>> * toggle features using github tags
>>>>>>>>> * optional testgroups (metastore/replication) tests
>>>>>>>>> * ability to run the metastore verification tests
>>>>>>>>> * possibility to add smoke tests
>>>>>>>>> 
>>>>>>>>> To enable this I will have to finish the HIVE-22942 ticket - beyond
>>>> the
>>>>>>>>> new Jenkinsfile which defines the full logic;
>>>>>>>>> although I've sinked a lot of time into fixing all kind of flaky
>>>> tests I
>>>>>>>>> would would like to disable around ~25 tests.
>>>>>>>>> 
>>>>>>>>> I also would like to propose a method to verify the stability of a
>>>> single
>>>>>>>>> test: run it a 100 times in series at the same place where the
>>>> precommit
>>>>>>>>> tests are running.
>>>>>>>>> This will put the bar high enough that only totally stable tests
>>>> could
>>>>>>>>> satisfy it (a 99% stable test has 36% chance to pass this without
>>>> being
>>>>>>>>> caught :D)
>>>>>>>>> After this will be in service it could be used to: validate that an
>>>>>>>>> existing test is unstable (before disabling it) - and then used
>>>> again to
>>>>>>>>> prove that it got fixed during
>>>>>>>>> re-enabling it.
>>>>>>>>> 
>>>>>>>>> Please let me know what you think!
>>>>>>>>> 
>>>>>>>>> cheers,
>>>>>>>>> Zoltan
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 4/29/20 4:28 PM, Zoltan Haindrich wrote:
>>>>>>>>>> Hey All!
>>>>>>>>>> 
>>>>>>>>>> I was planning to replace the ptest stuff with something less
>>>> complex
>>>>>>>>> for a while now - I see that we struggle a lot because of ptest is
>>>> more
>>>>>>>>> complicated than it should be...
>>>>>>>>>> It would be much better if it would be constructed from well made
>>>>>>>>> existing CI piece. - because of that I've started working on [1] a
>>>> few
>>>>>>>>> months ago.
>>>>>>>>>> 
>>>>>>>>>> It has it's pros and cons...but it's not the same as the existing
>>>> ptest
>>>>>>>>> stuff.
>>>>>>>>>> I've collected some infos about how it compares against the
>>>> existing one
>>>>>>>>> - but it became too long so I've moved it into a google docs
>>>> document at
>>>>>>>>> [3].
>>>>>>>>>> 
>>>>>>>>>> It's not yet ready... I still have some remaining
>>>> problems/concerns/etc
>>>>>>>>>> * what do you think about changing to a github PR based workflow?
>>>>>>>>>> * it will not support at all things like "isolation" - so we will
>>>> have
>>>>>>>>> to make our tests work with eachother without bending the rules...
>>>>>>>>>> * I've tried to overcommit the cpu resources which creates a more
>>>> noisy
>>>>>>>>> environment for the actual tests - this squeezes out some new
>>>> problems
>>>>>>>>> which should be fixed before
>>>>>>>>>> this could be enabled.
>>>>>>>>>> * for every PR the first run is somewhat sub-optimal...there are
>>>> some
>>>>>>>>> reasons for this - the actually used resources are the same; but
>> the
>>>>>>>>> overall execution time is not
>>>>>>>>>> optimal; I could accept this as a compromise because right now I
>>>> wait
>>>>>>>>>> 24 hours for a precommit run.
>>>>>>>>>> 
>>>>>>>>>> It's deployed at [2] and anyone can start a testrun on it:
>>>>>>>>>> * merge my HIVE-22942-ptest-alt branch from [4] into your branch
>>>>>>>>>> * open a PR against my hive repo on github [5]
>>>>>>>>>> 
>>>>>>>>>> cheers,
>>>>>>>>>> Zoltan
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> [1] https://issues.apache.org/jira/browse/HIVE-22942
>>>>>>>>>> [2] http://34.66.156.144:8080/job/hive-precommit
>>>>>>>>>> [3]
>>>>>>>>> 
>>>> 
>> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing
>>>>>>>>>> [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt
>>>>>>>>>> [5] https://github.com/kgyrtkirk/hive/
>>>>>>>>> 
>>>>>>>> 
>>>> 
>>> 
>>

Re: [DISCUSS] Replace ptest with hive-test-kube

Reply via email to