Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy

Simon Urli Mon, 03 Sep 2018 02:58:19 -0700

OK for me too.

Simon


On 9/3/18 10:31 AM, Thomas Mortagne wrote:

Sounds good.

On Mon, Sep 3, 2018 at 9:55 AM, Vincent Massol <[email protected]> wrote:

On 3 Sep 2018, at 09:55, Vincent Massol <[email protected]> wrote:

I propose to do this tomorrow Tuesday, starting with an intro from me, using 
youtube live.


Say, 10AM Paris time.

Thanks
-Vincent

WDYT?

Thanks
-Vincent

On 30 Aug 2018, at 12:27, Adel Atallah <[email protected]> wrote:

Just to be clear, when I proposed "having a whole day dedicated on
using these tools", I didn't meant having to have it every week but
only once, so we can properly start improving the tests. It would be
some kind of training.
On my side I don't think I'll be able to have on a week one day
dedicated to tests and one for bug fixing, I won't have time left for
the roadmap as I will only work on the product 50% of the time.


On Thu, Aug 30, 2018 at 12:18 PM, Vincent Massol <[email protected]> wrote:

Hi,

I don’t remember discussing this with you Thomas. Actually I’m not convinced to 
have a fixed day:
* we already have a fixed BFD and having a second one doesn’t leave much 
flexibility for working on roadmap items when it’s the best
* test sessions can be short (0.5-1 hours) and it’s easy to do them between 
other tasks
* it can be boring to spend a full day on them

Now, I agree that not having a fixed day will make it hard to make sure that we 
work 20% on that topic.

So if you prefer we can define a day, knowing that some won’t be able to always 
attend during that day and in this case they should do it on another day. 
What’s important is to have 20% done each week (i.e. enough work done on it).

In term of day, if we have to choose one, I’d say Tuesday. That’s the most 
logical to me.

WDYT? What do you prefer?

Thanks
-Vincent

On 30 Aug 2018, at 10:38, Thomas Mortagne <[email protected]> wrote:

Indeed we discussed this but I don't see it in your mail Vincent.

On Thu, Aug 30, 2018 at 10:33 AM, Adel Atallah <[email protected]> wrote:

Hello,

Maybe we should agree on having a whole day dedicated on using these
tools with a maximum number of developers.
That way we will be able to help each other and maybe it will make the
process easier to carry out in the future.

WDYT?

Thanks,
Adel


On Wed, Aug 29, 2018 at 11:20 AM, Vincent Massol <[email protected]> wrote:

Hi devs (and anyone else interested to improve the tests of XWiki),

History
======

It all started when I analyzed our global TPC and found that it was going down 
globally even though we have the fail-build-on-jacoco-threshold strategy.

I sent several email threads:

- Loss of TPC: http://markmail.org/message/hqumkdiz7jm76ya6
- TPC evolution: http://markmail.org/message/up2gc2zzbbe4uqgn
- Improve our TPC strategy: http://markmail.org/message/grphwta63pp5p4l7

Note: As a consequence of this last thread, I implemented a Jenkins Pipeline to 
send us a mail when the global TPC of an XWiki module goes down so that we fix 
it ASAP. This is still a development in progress. A first version is done and 
running at https://ci.xwiki.org/view/Tools/job/Clover/ but I need to debug it 
and fix it (it’s not working ATM).

As a result of the global TPC going down/stagnating, I have proposed to have 
10.7 focused on Tests + BFD.
- Initially I proposed to focus on increasing the global TPC by looking at the 
reports from 1) above (http://markmail.org/message/qjemnip7hjva2rjd). See the 
last report at https://up1.xwikisas.com/#mJ0loeB6nBrAgYeKA7MGGw (we need to fix 
the red parts).
- Then with the STAMP mid-term review, a bigger urgency surfaced and I asked if 
we could instead focus on fixing tests as reported by Descartes to increase 
both coverage and mutation score (ie test quality), since those are 2 
metrics/KPIs measured by STAMP and since XWiki participates to STAMP we need to 
work on them and increase them substantially. See 
http://markmail.org/message/ejmdkf3hx7drkj52

The results of XWiki 10.7 has been quite poor on test improvements  (more focus 
on BFD than tests, lots of devs on holidays, etc). This forces us to have a 
different strategy.

Full Strategy proposal
=================

1) As many XWiki SAS devs as possible (and anyone else from the community who’s 
interested ofc! :)) should spend 1 day per week working on improving STAMP 
metrics
* Currently the agreement is that Thomas and myself will do this for the 
foreseeable future till we get some good-enough metric progress
* Some other devs from XWiki SAS will help out for XWiki 10.8 only FTM (Marius, 
Adel if he can, Simon in the future). The idea is to see where that could get 
us by using substantial manpower.

2) All committers: More generally the global TPC failure is also already active 
and dev need to modify modules that see their global TPC go down.

3) All committers: Of course, the jacoco strategy is also active at each module 
level.

STAMP tools
==========

There are 4 tools developed by STAMP:
* Descartes: Improves quality of tests by increasing their mutation scores. See 
http://markmail.org/message/bonb5f7f37omnnog and also 
https://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
* DSpot: Automatically generate new tests, based on existing tests. See 
https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot
* CAMP: Takes a Dockerfile and generates mutations of it, then deploys and 
execute tests on the software to see if the mutation works or not. Note this is 
currently not fitting the need of XWiki and thus I’ve been developing another 
tool as an experiment (which may go back in CAMP one day), based on 
TestContainers, see 
https://massol.myxwiki.org/xwiki/bin/view/Blog/EnvironmentTestingExperimentations
* EvoCrash: Takes a stack trace from production logs and generates a test that, 
when executed, reproduces the crash. See 
https://markmail.org/message/v74g3tsmflquqwra. See also 
https://github.com/SERG-Delft/EvoCrash

Since XWiki is part of the STAMP research project, we need to use those 4 tools 
to increase the KPIs associated with the tools. See below.

Objectives/KPIs/Metrics for STAMP
===========================

The STAMP project has defined 9 KPIs that all partners (and thus XWiki) need to 
work on:

1) K01: Increase test coverage
* Global increase by reducing by 40% the non-covered code. For XWiki since 
we’re at about 70%, this means reaching about 80% before the end of STAMP (ie. 
before end of 2019)
* Increase the coverage contributions of each tool developed by STAMP.

Strategy:
* Primary goal:
** Increase coverage by executing Descartes and improving our tests. This is 
http://markmail.org/message/ejmdkf3hx7drkj52
** Don’t do anything with DSpot. I’ll do that part. Note that the goal is to 
write a Jenkins pipeline to automatically execute DSpot from time to time and 
commit the generated tests in a separate test source and have our build execute 
both src/test/java and this new test source.
** Don’t do anything with TestContainers FTM since I need to finish a first 
working version. I may need help in the future to implement docker images for 
more configurations (on Oracle, in a cluster, with LibreOffice, with an 
external SOLR server, etc).
** For EvoCrash: We’ll count contributions of EvoCrash to coverage in K08.
* Secondary goal:
** Increase our global TPC as mentioned above by fixing the modules in red.

2) K02: Reduce flaky tests.
* Objective: reduce the number of flaky tests by 20%

Strategy:
* Record flaky tests in jira
* Fix the max number of them

3) K03: Better test quality
* Objective: increase mutation score by 20%

Strategy:
* Same strategy as K01.

4) K04: More configuration-related paths tested
* Objective: increase the code coverage of configuration-related paths in our 
code by 20% (e.g. DB schema creation, cluster)related code, SOLR-related code, 
LibreOffice-related code, etc).

Strategy:
* Leave it to FTM. The idea is to measure Clover TPC with the base 
configuration, then execute all other configurations (with TestContainers) and 
regenerate the Clover report to see how much the TPC has increased.

5) K05: Reduce system-specific bugs
* Objective: 30% improvement

Strategy:
* Run TestContainers, execute existing tests and find new bugs related to 
configurations. Record them

6) K06: More configurations/Faster tests
* Objective: increase the number of automatically tested configurations by 50%

Strategy:
* Increase the # of configurations we test with TestContainers. I’ll do that 
part initially.
* Reduce time it takes to deploy the software under a given configuration vs 
time it used to take when done manually before STAMP. I’ll do this one. I’ve 
already worked on it in the past year with the dockerization of XWiki.

7) K07: Pending, nothing to do FTM

8) K08: More crash replicating test cases
* Objective: increase the number of crash replicating test cases by at least 70%

Strategy:
* For all issues that are still open and that have stack traces and for all 
issues closed but without tests, run EvoCrash on them to try to generate a test.
* Record and count the number of successful EvoCrash-generated test cases.
* Derive a regression test (which can be very different from the negative of 
the test generated by evocrash!).
* Measure the new coverage increase
* Note that I haven’t experimented much with this yet myself.

9) K09: Pending, nothing to do FTM.

Conclusion
=========

Right now, I need your help for the following KPIs: K01, K02, K03, K08.

Since there’s a lot to understand in this email, I’m open to:
* Organizing a meeting on youtube live to discuss all this
* Answering any questions on this thread ofc
* Also feel free to ask on IRC/Matrix.

Here’s an extract from STAMP which has more details about the KPIs/metrics:
https://up1.xwikisas.com/#QJyxqspKXSzuWNOHUuAaEA

Thanks
-Vincent




--
Thomas Mortagne


--
Simon Urli
Software Engineer at XWiki SAS
[email protected]
More about us at http://www.xwiki.com

Re: [xwiki-devs] [STAMP/Test] Metrics we need to improve + strategy

Reply via email to