Hi All
I’ve been thinking about regression testing recently and how we can improve
our tests for rendering. There are currently two problems:
1) Different JDKs produce slightly different renderings (see PDFBOX-1843).
(I suspect that AWT fonts are a big part of this, so the problem might get
a lot better
soon once we render all fonts ourselves).
2) Most PDF test files we have are not under an Apache-friendly license, so
we can’t put the test files into the trunk SVN.
It seems that some of you have your own collections of test PDF files which you
are
running regression tests on: that’s great but it would be much better if we had
a
central repository of test files and sample renderings.
I’d like to suggest the following solutions to the above issues:
1) We should choose a “blessed” JDK which will be used to perform the renderings
this should be whatever is a convenient and sensible default for
committers. (My
preference would be for Oracle’s JDK 7 because JDK 6 is deprecated has known
rendering bugs). We should make sure that Jenkins runs tests using the
”blessed”
JDK.
The regression test can then check to see if it is running on the “blessed”
JDK and
if not then the tests can be skipped and we can warn the user.
2) We should create a new “regression” branch in SVN which contains only PDF
files
for testing and PNG images which contain known-good renderings created
using the
“blessed” JDK. This branch would not be part of the source of PDFBox but
will still
allow us to version control the test PDFs (it also simplifies the workflow
for adding
new test PDFs and new known-good renderings: simply do an "svn add”).
As far as copyright and licensing is concerned we can put any PDF files
which are
available publicly on the web into this branch without too much worry.
What does everybody think?
-- John