Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy

Adel Atallah Thu, 30 Aug 2018 03:29:02 -0700

Just to be clear, when I proposed "having a whole day dedicated on
using these tools", I didn't meant having to have it every week but
only once, so we can properly start improving the tests. It would be
some kind of training.
On my side I don't think I'll be able to have on a week one day
dedicated to tests and one for bug fixing, I won't have time left for
the roadmap as I will only work on the product 50% of the time.



On Thu, Aug 30, 2018 at 12:18 PM, Vincent Massol <[email protected]> wrote:
> Hi,
>
> I don’t remember discussing this with you Thomas. Actually I’m not convinced 
> to have a fixed day:
> * we already have a fixed BFD and having a second one doesn’t leave much 
> flexibility for working on roadmap items when it’s the best
> * test sessions can be short (0.5-1 hours) and it’s easy to do them between 
> other tasks
> * it can be boring to spend a full day on them
>
> Now, I agree that not having a fixed day will make it hard to make sure that 
> we work 20% on that topic.
>
> So if you prefer we can define a day, knowing that some won’t be able to 
> always attend during that day and in this case they should do it on another 
> day. What’s important is to have 20% done each week (i.e. enough work done on 
> it).
>
> In term of day, if we have to choose one, I’d say Tuesday. That’s the most 
> logical to me.
>
> WDYT? What do you prefer?
>
> Thanks
> -Vincent
>
>> On 30 Aug 2018, at 10:38, Thomas Mortagne <[email protected]> wrote:
>>
>> Indeed we discussed this but I don't see it in your mail Vincent.
>>
>> On Thu, Aug 30, 2018 at 10:33 AM, Adel Atallah <[email protected]> 
>> wrote:
>>> Hello,
>>>
>>> Maybe we should agree on having a whole day dedicated on using these
>>> tools with a maximum number of developers.
>>> That way we will be able to help each other and maybe it will make the
>>> process easier to carry out in the future.
>>>
>>> WDYT?
>>>
>>> Thanks,
>>> Adel
>>>
>>>
>>> On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol <[email protected]> wrote:
>>>> Hi devs (and anyone else interested to improve the tests of XWiki),
>>>>
>>>> History
>>>> ======
>>>>
>>>> It all started when I analyzed our global TPC and found that it was going 
>>>> down globally even though we have the fail-build-on-jacoco-threshold 
>>>> strategy.
>>>>
>>>> I sent several email threads:
>>>>
>>>> - Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6
>>>> - TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn
>>>> - Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7
>>>>
>>>> Note: As a consequence of this last thread, I implemented a Jenkins 
>>>> Pipeline to send us a mail when the global TPC of an XWiki module goes 
>>>> down so that we fix it ASAP. This is still a development in progress. A 
>>>> first version is done and running at 
>>>> https://ci.xwiki.org/view/Tools/job/Clover/ but I need to debug it and fix 
>>>> it (it’s not working ATM).
>>>>
>>>> As a result of the global TPC going down/stagnating, I have proposed to 
>>>> have 10.7 focused on Tests + BFD.
>>>> - Initially I proposed to focus on increasing the global TPC by looking at 
>>>> the reports from 1) above (http://markmail.org/message/qjemnip7hjva2rjd). 
>>>> See the last report at https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw 
>>>> (we need to fix the red parts).
>>>> - Then with the STAMP mid-term review, a bigger urgency surfaced and I 
>>>> asked if we could instead focus on fixing tests as reported by Descartes 
>>>> to increase both coverage and mutation score (ie test quality), since 
>>>> those are 2 metrics/KPIs measured by STAMP and since XWiki participates to 
>>>> STAMP we need to work on them and increase them substantially. See 
>>>> http://markmail.org/message/ejmdkf3hx7drkj52
>>>>
>>>> The results of XWiki 10.7 has been quite poor on test improvements  (more 
>>>> focus on BFD than tests, lots of devs on holidays, etc). This forces us to 
>>>> have a different strategy.
>>>>
>>>> Full Strategy proposal
>>>> =================
>>>>
>>>> 1) As many XWiki SAS devs as possible (and anyone else from the community 
>>>> who’s interested ofc! :)) should spend 1 day per week working on improving 
>>>> STAMP metrics
>>>> * Currently the agreement is that Thomas and myself will do this for the 
>>>> foreseeable future till we get some good-enough metric progress
>>>> * Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM 
>>>> (Marius, Adel if he can, Simon in the future). The idea is to see where 
>>>> that could get us by using substantial manpower.
>>>>
>>>> 2) All committers: More generally the global TPC failure is also already 
>>>> active and dev need to modify modules that see their global TPC go down.
>>>>
>>>> 3) All committers: Of course, the jacoco strategy is also active at each 
>>>> module level.
>>>>
>>>> STAMP tools
>>>> ==========
>>>>
>>>> There are 4 tools developed by STAMP:
>>>> * Descartes: Improves quality of tests by increasing their mutation 
>>>> scores. See http://markmail.org/message/bonb5f7f37omnnog and also 
>>>> https://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
>>>> * DSpot: Automatically generate new tests, based on existing tests. See 
>>>> https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot
>>>> * CAMP: Takes a Dockerfile and generates mutations of it, then deploys and 
>>>> execute tests on the software to see if the mutation works or not. Note 
>>>> this is currently not fitting the need of XWiki and thus I’ve been 
>>>> developing another tool as an experiment (which may go back in CAMP one 
>>>> day), based on TestContainers, see 
>>>> https://massol.myxwiki.org/xwiki/bin/view/Blog/EnvironmentTestingExperimentations
>>>> * EvoCrash: Takes a stack trace from production logs and generates a test 
>>>> that, when executed, reproduces the crash. See 
>>>> https://markmail.org/message/v74g3tsmflquqwra. See also 
>>>> https://github.com/SERG-Delft/EvoCrash
>>>>
>>>> Since XWiki is part of the STAMP research project, we need to use those 4 
>>>> tools to increase the KPIs associated with the tools. See below.
>>>>
>>>> Objectives/KPIs/Metrics for STAMP
>>>> ===========================
>>>>
>>>> The STAMP project has defined 9 KPIs that all partners (and thus XWiki) 
>>>> need to work on:
>>>>
>>>> 1) K01: Increase test coverage
>>>> * Global increase by reducing by 40% the non-covered code. For XWiki since 
>>>> we’re at about 70%, this means reaching about 80% before the end of STAMP 
>>>> (ie. before end of 2019)
>>>> * Increase the coverage contributions of each tool developed by STAMP.
>>>>
>>>> Strategy:
>>>> * Primary goal:
>>>> ** Increase coverage by executing Descartes and improving our tests. This 
>>>> is http://markmail.org/message/ejmdkf3hx7drkj52
>>>> ** Don’t do anything with DSpot. I’ll do that part. Note that the goal is 
>>>> to write a Jenkins pipeline to automatically execute DSpot from time to 
>>>> time and commit the generated tests in a separate test source and have our 
>>>> build execute both src/test/java and this new test source.
>>>> ** Don’t do anything with TestContainers FTM since I need to finish a 
>>>> first working version. I may need help in the future to implement docker 
>>>> images for more configurations (on Oracle, in a cluster, with LibreOffice, 
>>>> with an external SOLR server, etc).
>>>> ** For EvoCrash: We’ll count contributions of EvoCrash to coverage in K08.
>>>> * Secondary goal:
>>>> ** Increase our global TPC as mentioned above by fixing the modules in red.
>>>>
>>>> 2) K02: Reduce flaky tests.
>>>> * Objective: reduce the number of flaky tests by 20%
>>>>
>>>> Strategy:
>>>> * Record flaky tests in jira
>>>> * Fix the max number of them
>>>>
>>>> 3) K03: Better test quality
>>>> * Objective: increase mutation score by 20%
>>>>
>>>> Strategy:
>>>> * Same strategy as K01.
>>>>
>>>> 4) K04: More configuration-related paths tested
>>>> * Objective: increase the code coverage of configuration-related paths in 
>>>> our code by 20% (e.g. DB schema creation, cluster)related code, 
>>>> SOLR-related code, LibreOffice-related code, etc).
>>>>
>>>> Strategy:
>>>> * Leave it to FTM. The idea is to measure Clover TPC with the base 
>>>> configuration, then execute all other configurations (with TestContainers) 
>>>> and regenerate the Clover report to see how much the TPC has increased.
>>>>
>>>> 5) K05: Reduce system-specific bugs
>>>> * Objective: 30% improvement
>>>>
>>>> Strategy:
>>>> * Run TestContainers, execute existing tests and find new bugs related to 
>>>> configurations. Record them
>>>>
>>>> 6) K06: More configurations/Faster tests
>>>> * Objective: increase the number of automatically tested configurations by 
>>>> 50%
>>>>
>>>> Strategy:
>>>> * Increase the # of configurations we test with TestContainers. I’ll do 
>>>> that part initially.
>>>> * Reduce time it takes to deploy the software under a given configuration 
>>>> vs time it used to take when done manually before STAMP. I’ll do this one. 
>>>> I’ve already worked on it in the past year with the dockerization of XWiki.
>>>>
>>>> 7) K07: Pending, nothing to do FTM
>>>>
>>>> 8) K08: More crash replicating test cases
>>>> * Objective: increase the number of crash replicating test cases by at 
>>>> least 70%
>>>>
>>>> Strategy:
>>>> * For all issues that are still open and that have stack traces and for 
>>>> all issues closed but without tests, run EvoCrash on them to try to 
>>>> generate a test.
>>>> * Record and count the number of successful EvoCrash-generated test cases.
>>>> * Derive a regression test (which can be very different from the negative 
>>>> of the test generated by evocrash!).
>>>> * Measure the new coverage increase
>>>> * Note that I haven’t experimented much with this yet myself.
>>>>
>>>> 9) K09: Pending, nothing to do FTM.
>>>>
>>>> Conclusion
>>>> =========
>>>>
>>>> Right now, I need your help for the following KPIs: K01, K02, K03, K08.
>>>>
>>>> Since there’s a lot to understand in this email, I’m open to:
>>>> * Organizing a meeting on youtube live to discuss all this
>>>> * Answering any questions on this thread ofc
>>>> * Also feel free to ask on IRC/Matrix.
>>>>
>>>> Here’s an extract from STAMP which has more details about the KPIs/metrics:
>>>> https://up1.xwikisas.com/#QJyxqspKXSzuWNOHUuAaEA
>>>>
>>>> Thanks
>>>> -Vincent
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>>
>> --
>> Thomas Mortagne
>

Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy

Reply via email to