Right - so the PDF validator in Acrobat professional does all of the four things you mention, as well as a few other checks and validations (including loading each font and image to see if the stream data is valid for the object in question).

It's not perfect either - of course, but AFAIK, it's the best option out there...

Leonard


On Aug 21, 2008, at 1:17 PM, AJ Weber wrote:

Well I think I can agree with most of your assertions.

However, there should be a pretty thorough way to do at least the following:

1) For each "element" (object, etc.) that exists in the file, check that it is "well formed" according to the PDF Reference. 2) For objects that reference other objects (bookmarks, gotor, etc.), check that these references are valid, and warn where practical (i.e. the ref is outside the PDF, so if it's not immediately found, warn that it may be an invalid ref.) 3) Mandatory elements, tags, objects, etc. are present and in their expected places. 4) PDFs include only valid elements for the PDF Ref version they specify.

The example PDF I was having issues with had a vendor give me this explaination as to why they couldn't "read it": The basic problem was that in "obj" tables after the figures or letters there must be a space code. In this file these tables contained entries followed by a NL + CR and not by a space. In my validator-spec (above), I would think this would be flagged by #1.

Thanks again,
AJ
----------------------------------------------------------------------------------------------
Hi,

Leonard Rosenthol wrote:

> Be aware that JHOVE is NOT a complete verifier - and it says so it
> their documentation.  It's certainly something that one could use as
> a "first pass", but don't rely on it.

As a side note, a bit nitpicking: Theoretically, there can't be a
complete verifier. At least, positively determining it being »complete« is just impossible. No software could bring the proof that the PDF _is_
valid (except if we define »valid« to mean »passed by this so-called
complete verifier«, but I think the reference is still the PDF standard
documents). It can only proof something to be _not_ valid.

Pragmatically, you're quite right: There are great variations in the
coverage of validation tests and I don't know of any Free Software
validators which would have a »big« coverage.

Any Validator that brings as result a »this document is a valid ...
document« is potentially lying. But OTOH, you probably can't explain
this philosophical problem to software-buying customers and they
wouldn't accept a software that brings a result along the lines of
»this document might actually be a valid ... document«.

You can probably trust a validator if it brings up an error in the
document rendering it invalid (but beware of implementation bugs). At
least, you have indications how to re-check this error report against
the relevant standard.

-hwh

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to