Right - so the PDF validator in Acrobat professional does all of the
four things you mention, as well as a few other checks and validations
(including loading each font and image to see if the stream data is
valid for the object in question).
It's not perfect either - of course, but AFAIK, it's the best option
out there...
Leonard
On Aug 21, 2008, at 1:17 PM, AJ Weber wrote:
Well I think I can agree with most of your assertions.
However, there should be a pretty thorough way to do at least the
following:
1) For each "element" (object, etc.) that exists in the file, check
that it is "well formed" according to the PDF Reference.
2) For objects that reference other objects (bookmarks, gotor,
etc.), check that these references are valid, and warn where
practical (i.e. the ref is outside the PDF, so if it's not
immediately found, warn that it may be an invalid ref.)
3) Mandatory elements, tags, objects, etc. are present and in their
expected places.
4) PDFs include only valid elements for the PDF Ref version they
specify.
The example PDF I was having issues with had a vendor give me this
explaination as to why they couldn't "read it":
The basic problem was that in "obj" tables after the figures or
letters there must be a space code. In this file these tables
contained entries followed by a NL + CR and not by a space.
In my validator-spec (above), I would think this would be flagged by
#1.
Thanks again,
AJ
----------------------------------------------------------------------------------------------
Hi,
Leonard Rosenthol wrote:
> Be aware that JHOVE is NOT a complete verifier - and it says so it
> their documentation. It's certainly something that one could use as
> a "first pass", but don't rely on it.
As a side note, a bit nitpicking: Theoretically, there can't be a
complete verifier. At least, positively determining it being
»complete«
is just impossible. No software could bring the proof that the PDF
_is_
valid (except if we define »valid« to mean »passed by this so-called
complete verifier«, but I think the reference is still the PDF
standard
documents). It can only proof something to be _not_ valid.
Pragmatically, you're quite right: There are great variations in the
coverage of validation tests and I don't know of any Free Software
validators which would have a »big« coverage.
Any Validator that brings as result a »this document is a valid ...
document« is potentially lying. But OTOH, you probably can't explain
this philosophical problem to software-buying customers and they
wouldn't accept a software that brings a result along the lines of
»this document might actually be a valid ... document«.
You can probably trust a validator if it brings up an error in the
document rendering it invalid (but beware of implementation bugs). At
least, you have indications how to re-check this error report against
the relevant standard.
-hwh
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge
Build the coolest Linux based applications with Moblin SDK & win
great prizes
Grand prize is a trip for two to an Open Source event anywhere in
the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://www.1t3xt.com/docs/book.php
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://www.1t3xt.com/docs/book.php