Well I think I can agree with most of your assertions.

However, there should be a pretty thorough way to do at least the following:

1) For each "element" (object, etc.) that exists in the file, check that it is 
"well formed" according to the PDF Reference.
2) For objects that reference other objects (bookmarks, gotor, etc.), check 
that these references are valid, and warn where practical (i.e. the ref is 
outside the PDF, so if it's not immediately found, warn that it may be an 
invalid ref.)
3) Mandatory elements, tags, objects, etc. are present and in their expected 
places.
4) PDFs include only valid elements for the PDF Ref version they specify.

The example PDF I was having issues with had a vendor give me this explaination 
as to why they couldn't "read it":
  The basic problem was that in "obj" tables after the figures or letters there 
must be a space code. In this file these tables contained entries followed by a 
NL + CR and not by a space.
In my validator-spec (above), I would think this would be flagged by #1.

Thanks again,
AJ

----------------------------------------------------------------------------------------------
Hi,

Leonard Rosenthol wrote:

> Be aware that JHOVE is NOT a complete verifier - and it says so it  
> their documentation.  It's certainly something that one could use as
> a "first pass", but don't rely on it.

As a side note, a bit nitpicking: Theoretically, there can't be a
complete verifier. At least, positively determining it being »complete«
is just impossible. No software could bring the proof that the PDF _is_
valid (except if we define »valid« to mean »passed by this so-called
complete verifier«, but I think the reference is still the PDF standard
documents). It can only proof something to be _not_ valid.

Pragmatically, you're quite right: There are great variations in the
coverage of validation tests and I don't know of any Free Software
validators which would have a »big« coverage.

Any Validator that brings as result a »this document is a valid ...
document« is potentially lying. But OTOH, you probably can't explain
this philosophical problem to software-buying customers and they
wouldn't accept a software that brings a result along the lines of
»this document might actually be a valid ... document«.

You can probably trust a validator if it brings up an error in the
document rendering it invalid (but beware of implementation bugs). At
least, you have indications how to re-check this error report against
the relevant standard.

-hwh

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to