Hi,
following is a description of what we are doing in our company.
With our software, we run regression tests after each nightly build and
sometimes it is a tough fight. If there is a regression, it is not so easy
to find which commit caused it, because there are potentially many between
the nightly builds. Then, the decision whether the change is wanted and
expected is in some cases also difficult (this part might be easier with PDF
where there is the "golden standard" rendering in Acrobat). If the change is
expected and the new rendering "better" then one has to commit the new
reference. This means that the files produced on the nightly build machine
must be available somehow - it is almost impossible to produce them locally
as the rendering results are slightly different with different versions of
java and many other reasons. All this has to be done before the next
regression test is run to avoid that new regressions are hidden by earlier
ones. Our complete build with all tests runs several hours...
To improve this workflow, we now use the following schema in addition:
- there is a smaller set of regression tests which runs relatively fast
- these tests are triggered by each commit in formatting and rendering
related projects
- before running the test itself, the modified project(s) are compiled
locally, w/o publishing the result to maven
- the reference rendering files are stored in SVN
- if a test finds a regression, it immediately stores the new result as a
new reference into SVN. This makes sure that a) the test renderings do not
get lost and b) that each regression exactly points to the commit that has
caused it - the one that triggered the test. The failed test creates a new
issue in JIRA with a pointer to SVN to the before and after rendering and a
bitmap of the differencies. The issue is then processed. If we find the
change to be expected then the issue is simply closed, otherwise we take
actions to fix the problem. The only annoying thing about this scheme is
that, after commiting the correction, the test runs again and reports a
regression because it now compares to the faulty version of the rendering.
Best regards,
Petr.
-----Původní zpráva-----
From: John Hewson
Sent: Friday, July 04, 2014 7:39 PM
To: dev@pdfbox.apache.org
Subject: Re: Regression Testing
Hi Tilman
Thanks for your thoughts, I think that your concerns are already covered by
my original proposal, I’ll try to explain why and how:
Of course I agree with the need for regression tests, however it isn't
easy: besides the problems of the different JDKs (I use JDK7 Windows 64
bit), there is the problem that some enhancements create slight changes in
rendering that are not errors, i.e. both the "before" and the "after"
files look OK by itself. This has happened when we changed the text
rendering recently, and has happened again when the clipping was improved.
The cause are probably slight changes in color or in boundaries.
If a rendering has changed then the regression test should fail. When a
failure occurs the developer needs to manually inspect the differences (we
could generate a visual diff which highlights what changed to make this
easier) and if ok then they can replace the known-good PNG with the ones
just rendered. Indeed this will be the basic workflow for working with
regression tests.
Copyrights is a problem: I'm testing mostly with JIRA attachments that
I've downloaded over the years. While uploading such files to JIRA might
count as fair use, I doubt that this would still be true if they are
included in a distribution. Instead, they should be stored somewhere on
Apache servers where only committers and build software ("Travis",
"Jenkins", ...) can access then. The public PDFs that Maruan mentions
don't possibly have all the Problem cases that we solved before. However I
have started working with these files and there are at least 5 recent
issues that deals with them.
The PDFs won’t be in a distribution. They will just happen to be stored in
an SVN repo but not our source code repo, in the same way that the website
is stored in the “cmssite” branch of SVN or indeed, are on JIRA. The law
doesn’t distinguish between JIRA and SVN, both are publicly available via
HTTP, so using SVN will simply be a continuation of what we’re already doing
with JIRA.
The crucial factor is that we’re only storing publicly available PDFs,
because we have the right to do so, just like Google’s cache, and like we
currently do with JIRA.
Additionally, the PDFs need to be version controlled otherwise we won’t be
able to reliably recreate previous builds, so storing the files on a web
server won’t be practical. Also committers will frequently be updating the
renderings as bugs are fixed and we’ll need to version-control the rendered
PNG files for the same reason. Finally, having committers-only files doesn’t
fit well with the Apache goal of open development and would be unnecessary
anyway given that all the PDFs are to be taken from public sources only.
In summary, I’m proposing that we just keep doing what we’re currently doing
with JIRA but we move it into its own SVN repo along with some pre-rendered
PNGs.
Re preflight: the default mode should be to have the Isartor tests on.
Individuals could still disable them locally, but the central build
software should always use them.
Yes - does anybody know why this isn’t the default?
-- John