Re: [iText-questions] open-source PDF validation

Leonard Rosenthol Thu, 21 Aug 2008 12:01:42 -0700

Right - so the PDF validator in Acrobat professional does all of thefour things you mention, as well as a few other checks and validations(including loading each font and image to see if the stream data isvalid for the object in question).

It's not perfect either - of course, but AFAIK, it's the best optionout there...


Leonard


On Aug 21, 2008, at 1:17 PM, AJ Weber wrote:

Well I think I can agree with most of your assertions.
However, there should be a pretty thorough way to do at least thefollowing:
1) For each "element" (object, etc.) that exists in the file, checkthat it is "well formed" according to the PDF Reference.2) For objects that reference other objects (bookmarks, gotor,etc.), check that these references are valid, and warn wherepractical (i.e. the ref is outside the PDF, so if it's notimmediately found, warn that it may be an invalid ref.)3) Mandatory elements, tags, objects, etc. are present and in theirexpected places.4) PDFs include only valid elements for the PDF Ref version theyspecify.
The example PDF I was having issues with had a vendor give me thisexplaination as to why they couldn't "read it":The basic problem was that in "obj" tables after the figures orletters there must be a space code. In this file these tablescontained entries followed by a NL + CR and not by a space.In my validator-spec (above), I would think this would be flagged by#1.
Thanks again,
AJ
----------------------------------------------------------------------------------------------
Hi,

Leonard Rosenthol wrote:

> Be aware that JHOVE is NOT a complete verifier - and it says so it
> their documentation.  It's certainly something that one could use as
> a "first pass", but don't rely on it.

As a side note, a bit nitpicking: Theoretically, there can't be a
complete verifier. At least, positively determining it being»complete«is just impossible. No software could bring the proof that the PDF_is_
valid (except if we define »valid« to mean »passed by this so-called
complete verifier«, but I think the reference is still the PDFstandard
documents). It can only proof something to be _not_ valid.

Pragmatically, you're quite right: There are great variations in the
coverage of validation tests and I don't know of any Free Software
validators which would have a »big« coverage.

Any Validator that brings as result a »this document is a valid ...
document« is potentially lying. But OTOH, you probably can't explain
this philosophical problem to software-buying customers and they
wouldn't accept a software that brings a result along the lines of
»this document might actually be a valid ... document«.

You can probably trust a validator if it brings up an error in the
document rendering it invalid (but beware of implementation bugs). At
least, you have indications how to re-check this error report against
the relevant standard.

-hwh

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer'schallengeBuild the coolest Linux based applications with Moblin SDK & wingreat prizesGrand prize is a trip for two to an Open Source event anywhere inthe world
http://moblin-contest.org/redirect.php?banner_id=100&url=/_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Re: [iText-questions] open-source PDF validation

Reply via email to