>> There is no such thing as "canonical" PDF - anything that complies with the >> PDF specification is valid. That allows for various uses of >>compression, >> ASCII encoding, etc. > >Well, not really. If there are rules for the PDF standard then you could in >fact create some alternative representation- it could >be super big, verbose, complicated, etc but it may be a useful intermediate >form for various types of work >such as debug or adhoc editing where you don't want to waste time writing >custom code to do something simple. > No argument!
BUT an "intermediate format" (or an "alternative format") and a "canonical format" are VERY VERY different things... There are many folks who have developed alternative representations of PDF, whether in XML or other formats, including Adobe ourselves. For example, Adobe has a project codenamed "Mars" on our Labs site (<http://labs.adobe.com/wiki/index.php/Mars>) which describes an XML+ZIP-based representation of PDF. It supports all of the features of PDF from PDF 1.7. We provide some tooling for Acrobat & Reader, and you are welcome to develop your own. But again, that's NOT canonical - just alternative. >> That's why library such as iText exist - to provide you with higher level >> APIs (where possible). They are what one would use to create >> automated test tools, validators, etc. And many such tools already do exist >> - so it's definitely doable (and has been done). >> >If you took that attitude you couldn't even hide behind "but pdf is a >standard" since then the argument is " well I have API >xyz and we can do anything with it. if you use my ABC format" I guess having >a list would help, is there a pdf >developer download somewhere with tools like this? > Adobe Acrobat Professional includes a PDF validator feature as part of its Preflight module, and has since version 7. It is the only publicly available validator that I am aware of, though I have spoken to at least a half-dozen commercial PDF vendors that have told me that they have developed their own validators for their own use. There used to be two limited open source validators - JHOVE (<http://hul.harvard.edu/jhove/pdf-hul.html>) and Multivalent (<http://multivalent.sourceforge.net/Tools/pdf/Validate.html>). But to my knowledge, neither is currently supported/updated. Since both were Java-based OSS, I would think you could pick them up and run with them if you wished. Leonard ------------------------------------------------------------------------------ _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/