----------------------------------------
> From: lrose...@adobe.com
> To: itext-questions@lists.sourceforge.net
> Date: Mon, 10 May 2010 19:01:15 -0700
> Subject: Re: [iText-questions] how to detect remote links in a PDF ?
>
>>> There is no such thing as "canonical" PDF - anything that complies with the
>>> PDF specification is valid. That allows for various uses of>>compression,
>>> ASCII encoding, etc.
>>
>>Well, not really. If there are rules for the PDF standard then you could in
>>fact create some alternative representation- it could
>>be super big, verbose, complicated, etc but it may be a useful intermediate
>>form for various types of work
>>such as debug or adhoc editing where you don't want to waste time writing
>>custom code to do something simple.
>>
> No argument!
>
> BUT an "intermediate format" (or an "alternative format") and a "canonical
> format" are VERY VERY different things...
Well, at least canonical would be something like "pdf that doesn't do anything
fancy and has rules for otherwise
arbitary choices" then you could do simple things like ASCII searches and maybe
binary diffs to test for pixel equality etc.
>
> There are many folks who have developed alternative representations of PDF,
> whether in XML or other formats, including Adobe ourselves. For example,
> Adobe has a project codenamed "Mars" on our Labs site () which describes an
> XML+ZIP-based representation of PDF. It supports all of the features of PDF
> from PDF 1.7. We provide some tooling for Acrobat & Reader, and you are
> welcome to develop your own.
>
> But again, that's NOT canonical - just alternative.
But that would work for the original purpose too. Maybe you should mention
these on itext somewhere and refer
people to them. It is hard to say you wil be accused of being biases any more
than you already are and
if the tools work who cares if you are biased? LOL .
>From your terse descriptions, that even sounds like a sane and workable
>approach, not what I would have expected ( sorry,
had to interject LOL).
This is also not irrelevant to itext implementation as a prior thread was
talking about optimizations at
an algorithm level If you had some attributes of a parsed or intermediate form
that make various
manipulations easy, it may be a good thing for itext to parse into or even
write out for other canned
( itext based or not ) tools to use.
cat pdf | itext_parse_to_intmediate_form | my_itext_tool | intermediate_to_pdf
-O3> new.pdf
Piping can be slow but obviously you can start mashing tools together etc.
>
>
>>> That's why library such as iText exist - to provide you with higher level
>>> APIs (where possible). They are what one would use to create
>>> automated test tools, validators, etc. And many such tools already do exist
>>> - so it's definitely doable (and has been done).
>>>
>>If you took that attitude you couldn't even hide behind "but pdf is a
>>standard" since then the argument is " well I have API
>>xyz and we can do anything with it. if you use my ABC format" I guess having
>>a list would help, is there a pdf
>>developer download somewhere with tools like this?
>>
> Adobe Acrobat Professional includes a PDF validator feature as part of its
> Preflight module, and has since version 7. It is the only publicly available
> validator that I am aware of, though I have spoken to at least a half-dozen
> commercial PDF vendors that have told me that they have developed their own
> validators for their own use.
>
> There used to be two limited open source validators - JHOVE () and
> Multivalent (). But to my knowledge, neither is currently supported/updated.
> Since both were Java-based OSS, I would think you could pick them up and run
> with them if you wished.
>
Ok, sounds like reasonable starting points.
I'm not saying it is trivial to do any of this, but it does seem much of the
traffic here never gets
referred to any simple diagnostics.
>
> Leonard
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> Buy the iText book: http://www.itextpdf.com/book/
> Check the site with examples before you ask questions:
> http://www.1t3xt.info/examples/
> You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions:
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/