J.Pietschmann wrote:

> I tried to produce a concept for some automated JUnit test, with
> the intent to quickly uncover regressions during wholesale
> refactoring.
> I came up with
>   http://cvs.apache.org/~pietsch/FopTest.java
> sample control file at
>   http://cvs.apache.org/~pietsch/regression.xml
>
> Overview: the control file holds a source, a MD5 for it so you
> can detect test failures caused by a changed source rather than
> a regression easier and a MD5 for the result file. If the test
> runs through, all is well. If a test fails, you can investigate
> the file and see whether the change was a regression (fix it)
> or an improvement (update the MD5 for the result).

Awesome.

> Problems:
> - PDFInfo unconditionally puts the creation time into the
>    PDF. This thwarts the whole thing. On my machine I can
>    disable it temporarily, but there should be a more
>    sustainable solution. Ideas:
>    o pass a flag to the renderer which inhibits creation time
>      creation
>    o pass a creation date value (can be abused, but abusers can
>      implement it anyway)
>    o patch it in the result array before digesting (hack alert)

Second choice makes the most sense to me. There are other non-abusive uses
for an artificial creation date -- for example, creating a collection of
user documentation files that all have the same date/time stamp as part of a
release.

> - Source FO line endings: both CVS and ZIP may alter them,
>    making the source MD5 invalid. I'm not sure whether FixCRLF
>    can be of use here. Either way, running the tests from Eclipse
>    unprepared could be a bad idea. Possible fixes:
>    o have two MD5 in the control one for the source with CRLF,
>      one with LF only. Makes updating more unconvenient.
>    o use another FilterStream to transform CRLF->LF before
>      digesting. Adds unwanted complexity, but probably the way
>      to go...

I agree that the second choice is better. I haven't had time to explore the
line-ending issue with Eclipse. I'm guessing that most of the Eclipse users
on this list are using it with Linux? If so, then it is not an issue?
Otherwise, they must have some scheme for conversion to LF already --
otherwise how do they get code checked in?

> - Hidden regressions: a checksum mismatch does not necesarily
>    cause a visible problem, lets say the author string gets spaces
>    appended or such. For proper inspection of failures we probably
>    need a more sophisticated tool than simply display two PDFs
>    side-by-side. For a starter, a sort of PDF diff which extracts
>    the streams, uncompresses and displays mismatches with a bit of
>    context would certainly be valuable. Any takers?

Interesting idea. Maybe it is (almost) as good (but not as much fun) to let
the developer convert each PDF file to Postscript & diff the Postscript
files. Or, perhaps to use the Postscript output option in the first place if
you have a hidden difference. It might actually be a better use of resources
to beef up the Postscript output and add pdfmarks into it, ie. to make sure
that FO --> Postscript (with pdfmarks) --> PDF (using Distiller) produces
identical results as FO --> PDF. That might be tricky or even impossible,
but if it worked, then Postscript with pdfmarks could be used as the input
to diff, with the added benefit that our Postscript output would kind of be
forced to keep pace with our PDF output.

You are definitely on a useful track here.

Victor Mote


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Reply via email to