On Fri, Sep 4, 2015 at 12:56 PM, Raghavendra Talur <[email protected]> wrote:
> > > >> Maintainers - can you please take stock of this and ensure sanity of your >> components before merging patches that do not fix a failing test? >> >> > Here is my proposal to get this fixed. > > > This weekend, 5th September 0400 UTC, I will start a jenkins run on master > and 3.7 branches. > > > - It will be re-based with code just before it is run, so all patches > merged by 4th September would be tested. > - It will run each test for 10 times in succession. Why 10? > - Hope to find tests that fail occasionally. > - If the tests fails only for 1st run, it could very well be a > cleanup issue with last run test. > - Failures within the 10 runs in a pattern is again indicative of > some cleanup/timeout error. > - It will run all tests and not stop at the first failure. > - I will have scripts modified to get maximum data from logs. (It will > still be INFO level logs) > > After the test completes, I will file a bug against the component of the > .t tests that fail in this run and immediately add the test to bad tests > list. > > What should the maintainers do after that? > > > - If a bug is filed against your component, please spend some time on > Monday and root cause the issue by Monday EOD. > - If the root cause proves that the bug is in .t file > - It is would be mostly because > - The timeouts are not enough all the time. Change EXPECT_WITHIN > values and check. > - The test is not deterministic enough ; some of the assumptions > that test makes might not always be true. For example, a SIGTERM > followed > by a TEST which assumes that process is definitely killed is a wrong > assumption. Use SIGKILL in such cases. (I know SIGKILL may not work > too if > the process is in D state, but its a good enough example) > - It is easier to fix bugs in.t once the root cause is found. > Please fix the issue and remove it from bad tests list. Use the bug > filed > against this .t file. > - If the root cause proves that the bug is in Gluster code: > - If the bug is in same component as the .t file: > - In this case, you are the component owner, change the > description and summary of the bug filed to indicate the actual > issue. > - If the time required to fix the issue in Gluster code is > non-minimal > - Put a workaround in .t file with a comment clearly stating > the bug number which would later fix it and remove the test from > bad test > list. > - If a workaround is not possible let the test remain in bad > test list. > - If the bug is not in same component as the .t file: > - Update the bug with details which prove that bug is not in the > same component and change the component accordingly. > - It is new owner's responsibility to provide a workaround for > all .t files hit by the issue and fix the code. > > Note to all maintainers: > > - I would request everyone to resist merging patches this weekend > unless critically required. It would help us in debugging on Monday. > > I did try this over the weekend. Refer to the patch at http://review.gluster.org/#/c/12109/. However, I discovered that tests failed continuously after certain tests failed in a run thereby indicating that our cleanup function is not sufficient/complete. I will be working on fixing few functions in run-tests.sh and include.rc before coming back to this next weekend. > > Lets hope that when we do a similar jenkins run on next weekend, September > 12th, we don't find any failures. > > Suggestions welcome for any changes in the above plan. > > Thanks, > Raghavendra Talur >
_______________________________________________ Gluster-devel mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-devel
