Re: [Gluster-devel] Difference in bad_tests count in mainline vs 3.7 branch

Raghavendra Talur Mon, 07 Sep 2015 13:26:20 -0700

On Fri, Sep 4, 2015 at 12:56 PM, Raghavendra Talur <[email protected]>
wrote:


>
>
>
>> Maintainers - can you please take stock of this and ensure sanity of your
>> components before merging patches that do not fix a failing test?
>>
>>
> Here is my proposal to get this fixed.
>
>
> This weekend, 5th September 0400 UTC, I will start a jenkins run on master
> and 3.7 branches.
>
>
>    - It will be re-based with code just before it is run, so all patches
>    merged by 4th September would be tested.
>    - It will run each test for 10 times in succession. Why 10?
>       - Hope to find tests that fail occasionally.
>       - If the tests fails only for 1st run, it could very well be a
>       cleanup issue with last run test.
>       - Failures within the 10 runs in a pattern is again indicative of
>       some cleanup/timeout error.
>    - It will run all tests and not stop at the first failure.
>    - I will have scripts modified to get maximum data from logs. (It will
>    still be INFO level logs)
>
> After the test completes, I will file a bug against the component of the
> .t tests that fail in this run and immediately add the test to bad tests
> list.
>
> What should the maintainers do after that?
>
>
>    - If a bug is filed against your component, please spend some time on
>    Monday and root cause the issue by Monday EOD.
>    - If the root cause proves that the bug is in .t file
>       - It is would be mostly because
>          - The timeouts are not enough all the time. Change EXPECT_WITHIN
>          values and check.
>          - The test is not deterministic enough ; some of the assumptions
>          that test makes might not always be true. For example, a SIGTERM 
> followed
>          by a TEST which assumes that process is definitely killed is a wrong
>          assumption. Use SIGKILL in such cases. (I know SIGKILL may not work 
> too if
>          the process is in D state, but its a good enough example)
>       - It is easier to fix bugs in.t once the root cause is found.
>       Please fix the issue and remove it from bad tests list. Use the bug 
> filed
>       against this .t file.
>    - If the root cause proves that the bug is in Gluster code:
>       - If the bug is in same component as the .t file:
>          - In this case, you are the component owner, change the
>          description and summary of the bug filed to indicate the actual 
> issue.
>          - If the time required to fix the issue in Gluster code is
>          non-minimal
>             - Put a workaround in .t file with a comment clearly stating
>             the bug number which would later fix it and remove the test from 
> bad test
>             list.
>             - If a workaround is not possible let the test remain in bad
>             test list.
>          - If the bug is not in same component as the .t file:
>          - Update the bug with details which prove that bug is not in the
>          same component and change the component accordingly.
>          - It is new owner's responsibility to provide a workaround for
>          all .t files hit by the issue and fix the code.
>
> Note to all maintainers:
>
>    - I would request everyone to resist merging patches this weekend
>    unless critically required. It would help us in debugging on Monday.
>
>
 I did try this over the weekend. Refer to the patch at
http://review.gluster.org/#/c/12109/.

However, I discovered that tests failed continuously after certain tests
failed in a run thereby
indicating that our cleanup function is not sufficient/complete.

I will be working on fixing few functions in run-tests.sh and include.rc
before coming back to this next weekend.


>
> Lets hope that when we do a similar jenkins run on next weekend, September
> 12th, we don't find any failures.
>
> Suggestions welcome for any changes in the above plan.
>
> Thanks,
> Raghavendra Talur
>

_______________________________________________
Gluster-devel mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Difference in bad_tests count in mainline vs 3.7 branch

Reply via email to