Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-11 Thread Mike Marchywka










 From: lrose...@adobe.com
 To: itext-questions@lists.sourceforge.net
 Date: Mon, 10 May 2010 19:01:15 -0700
 Subject: Re: [iText-questions] how to detect remote links in a PDF ?

 There is no such thing as canonical PDF - anything that complies with the 
 PDF specification is valid. That allows for various uses ofcompression, 
 ASCII encoding, etc.

Well, not really. If there are rules for the PDF standard then you could in 
fact create some alternative representation- it could
be super big, verbose, complicated, etc but it may be a useful intermediate 
form for various types of work
such as debug or adhoc editing where you don't want to waste time writing 
custom code to do something simple.

 No argument!

 BUT an intermediate format (or an alternative format) and a canonical 
 format are VERY VERY different things...

Well, at least canonical would be something like pdf that doesn't do anything 
fancy and has rules for otherwise
arbitary choices then you could do simple things like ASCII searches and maybe 
binary diffs to test for pixel equality etc.


 There are many folks who have developed alternative representations of PDF, 
 whether in XML or other formats, including Adobe ourselves. For example, 
 Adobe has a project codenamed Mars on our Labs site () which describes an 
 XML+ZIP-based representation of PDF. It supports all of the features of PDF 
 from PDF 1.7. We provide some tooling for Acrobat  Reader, and you are 
 welcome to develop your own.

 But again, that's NOT canonical - just alternative.

But that would work for the original purpose too. Maybe you should mention 
these on itext somewhere and refer
people to them. It is hard to say you wil be accused of being biases any more 
than you already are and
if the tools work who cares if you are biased? LOL .

From your terse descriptions, that even sounds like a sane and workable 
approach, not what I would have expected ( sorry,
had to interject LOL). 

This is also not irrelevant to itext implementation as a prior thread was 
talking about optimizations at
an algorithm level If you had some attributes of a parsed or intermediate form 
that make various
manipulations easy, it may be a good thing for itext to parse into or even 
write out for other canned
( itext based or not ) tools to use.

cat pdf | itext_parse_to_intmediate_form | my_itext_tool | intermediate_to_pdf 
-O3 new.pdf

Piping can be slow but obviously you can start mashing tools together etc. 




 That's why library such as iText exist - to provide you with higher level 
 APIs (where possible). They are what one would use to create
 automated test tools, validators, etc. And many such tools already do exist 
 - so it's definitely doable (and has been done).

If you took that attitude you couldn't even hide behind but pdf is a 
standard since then the argument is  well I have API
xyz and we can do anything with it. if you use my ABC format  I guess having 
a list would help, is there a pdf
developer download somewhere with tools like this?

 Adobe Acrobat Professional includes a PDF validator feature as part of its 
 Preflight module, and has since version 7. It is the only publicly available 
 validator that I am aware of, though I have spoken to at least a half-dozen 
 commercial PDF vendors that have told me that they have developed their own 
 validators for their own use.

 There used to be two limited open source validators - JHOVE () and 
 Multivalent (). But to my knowledge, neither is currently supported/updated. 
 Since both were Java-based OSS, I would think you could pick them up and run 
 with them if you wished.

Ok, sounds like reasonable starting points.

I'm not saying it is trivial to do any of this, but it does seem much of the 
traffic here never gets
referred to any simple diagnostics.



 Leonard


 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask

Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-11 Thread Leonard Rosenthol
Well, at least canonical would be something like pdf that doesn't do anything 
fancy and has rules for otherwise
arbitary choices then you could do simple things like ASCII searches and 
maybe binary diffs to test for pixel equality etc.

While the ability to do ASCII searches and maybe binary diffs are goals that 
you desire in a file format - neither were goals that PDF had (or has) and so 
the design for it doesn't take either into account.  

If you think those should be underlying goals for PDF 2.0 (ISO 32000-2) - NOW 
IS THE TIME FOR YOU TO GET INVOLVED!  We are still working on the next version 
of PDF at the ISO.  ANYONE can get involved on the committee, at NO COST.  Just 
contact your countries standards body and volunteer.  (if you don't know who 
that is, let me know what country you reside in and I will be happy to provide 
a name  email for you).

PDF is a fully open standard.  ANYONE can contribute.  We WELCOME participation.

BUT if you choose not to get involved, and we don't do the things you want, 
then you're complaints will fall on deaf ears...


Leonard


--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-11 Thread Bruno Lowagie
Leonard Rosenthol wrote:
 PDF is a fully open standard.  ANYONE can contribute.  We WELCOME 
 participation.

I'm interested!
I'm going to the Adobe Pulse next week:
http://events.adobe.co.uk/cgi-bin/register.cgi?country=ukeventid=9660venueid=9813
Is there somebody I can talk to about this?
best regards,
Bruno

--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-11 Thread Leonard Rosenthol
Yes, I believe that there should be a gentleman there named Colin that can help 
you...

Leonard

-Original Message-
From: Bruno Lowagie [mailto:br...@lowagie.com] 
Sent: Tuesday, May 11, 2010 8:06 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] how to detect remote links in a PDF ?

Leonard Rosenthol wrote:
 PDF is a fully open standard.  ANYONE can contribute.  We WELCOME 
 participation.

I'm interested!
I'm going to the Adobe Pulse next week:
http://events.adobe.co.uk/cgi-bin/register.cgi?country=ukeventid=9660venueid=9813
Is there somebody I can talk to about this?
best regards,
Bruno

--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-10 Thread 1T3XT info
pieter vankeerberghen wrote:
 Colleagues,
 
 For an application, one needs to detect the hyperlinks (i.e. done with 
 Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone 
 point me to a solution ?

Links are annotations, annotations are referred to in the page 
dictionary of every page. So you need to loop over all the page in an 
existing PDF and use getPageN(n) to get the page dictionary.

Get the Annots entry from this dictionary and loop over the annotations. 
Check if there are annotations with Subtype Link. Inspect those annotations.

Note that this is only one way to link to an external page. There could 
be links using other annotations too (e.g. involving a JavaScript action).
-- 
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info

--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-10 Thread Mike Marchywka











 Date: Sun, 9 May 2010 23:08:51 +0200
 From: papa...@googlemail.com
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] how to detect remote links in a PDF ?

 Colleagues,

 For an application, one needs to detect the hyperlinks (i.e. done with
 Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone
 point me to a solution ?

Question for leonard or others who have read the spec, if you literally ONLY
want to list the links, not parse the document or determine any context,
 are they likely to be hidden or can you just use text
tools to find strings that start or contain http ? For example,


  540  cat *.pdf ../Desktop/*.pdf  | sed -e 's/[^a-ZA-Z0-9/:.?]/\n/g' | grep 
http
  541  cat *.pdf ../Desktop/*.pdf  | strings | grep http
  542  history

These seem to work in that they find things with http but not sure what would be
missing. Many of these seem to be surrounded by xml or prefixed with /A 
but not sure what other contexts may exist.

Thanks.







 Thank you very much in advance,
 Pieter Vankeerberghen

 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with 
Hotmail. 
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-10 Thread Leonard Rosenthol
Prior to PDF 1.5, you could have done a grep (or equivalent) since only stream 
objects were compressed.  However, as of PDF 1.5, we now have object streams, 
where groups of objects are placed into a stream and then compressed - which 
means that grep will no longer work.

Adobe Acrobat 9 will ALWAYS (unless restricted by a specific ISO standard, such 
as PDF/A) use object stream compression to keep file sizes down.  I've been 
trying to recommend that other products do the same.

So while there certainly exists lots of PDFs that you could grep, the numbers 
are reducing daily...

Leonard

-Original Message-
From: Mike Marchywka [mailto:marchy...@hotmail.com] 
Sent: Monday, May 10, 2010 3:51 AM
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] how to detect remote links in a PDF ?












 Date: Sun, 9 May 2010 23:08:51 +0200
 From: papa...@googlemail.com
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] how to detect remote links in a PDF ?

 Colleagues,

 For an application, one needs to detect the hyperlinks (i.e. done with
 Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone
 point me to a solution ?

Question for leonard or others who have read the spec, if you literally ONLY
want to list the links, not parse the document or determine any context,
 are they likely to be hidden or can you just use text
tools to find strings that start or contain http ? For example,


  540  cat *.pdf ../Desktop/*.pdf  | sed -e 's/[^a-ZA-Z0-9/:.?]/\n/g' | grep 
http
  541  cat *.pdf ../Desktop/*.pdf  | strings | grep http
  542  history

These seem to work in that they find things with http but not sure what would be
missing. Many of these seem to be surrounded by xml or prefixed with /A 
but not sure what other contexts may exist.

Thanks.







 Thank you very much in advance,
 Pieter Vankeerberghen

 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
  
_
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with 
Hotmail. 
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-10 Thread Mike Marchywka







 From: lrose...@adobe.com
 To: itext-questions@lists.sourceforge.net
 Date: Mon, 10 May 2010 06:44:13 -0700
 Subject: Re: [iText-questions] how to detect remote links in a PDF ?

 Prior to PDF 1.5, you could have done a grep (or equivalent) since only 
 stream objects were compressed. However, as of PDF 1.5, we now have object 
 streams, where groups of objects are placed into a stream and then 
 compressed - which means that grep will no longer work.

 Adobe Acrobat 9 will ALWAYS (unless restricted by a specific ISO standard, 
 such as PDF/A) use object stream compression to keep file sizes down. I've 
 been trying to recommend that other products do the same.


Is there some utility like in pdf tk to convert a pdf with arbitrary stuff in 
it to some Standard or 
canonical format that can let it be used with other tools so you don't have to 
write custom code for
every little trivail variation of a thing you wish to accopmlish? For example,

cat xxx.pdf | pdf_to_standard_form | grep http 


Obivously applicability would go beyond the immediate question but also let 
people writing itext
code have some way to check their results more easily than it opened in 
proprietary adobe product X
but in black box Y it greyed out 3 menu options and wouldn't let me save it 
unless blah blah bla ?

There is nothing wrong with a human readable end product but given the 
complexity of these things
it would be nice to use computers to automate certain things, like checking for 
links
or other attributes. Without ability to use automated tools everything comes 
down to a long
menu chain and terse messages from products not designed for debug.






 So while there certainly exists lots of PDFs that you could grep, the numbers 
 are reducing daily...

 Leonard

 -Original Message-
 From: Mike Marchywka [mailto:marchy...@hotmail.com]
 Sent: Monday, May 10, 2010 3:51 AM
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] how to detect remote links in a PDF ?











 
 Date: Sun, 9 May 2010 23:08:51 +0200
 From: papa...@googlemail.com
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] how to detect remote links in a PDF ?

 Colleagues,

 For an application, one needs to detect the hyperlinks (i.e. done with
 Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone
 point me to a solution ?

 Question for leonard or others who have read the spec, if you literally ONLY
 want to list the links, not parse the document or determine any context,
  are they likely to be hidden or can you just use text
 tools to find strings that start or contain http ? For example,


   540  cat *.pdf ../Desktop/*.pdf  | sed -e 's/[^a-ZA-Z0-9/:.?]/\n/g' | grep 
 http
   541  cat *.pdf ../Desktop/*.pdf  | strings | grep http
   542  history

 These seem to work in that they find things with http but not sure what would 
 be
 missing. Many of these seem to be surrounded by xml or prefixed with /A
 but not sure what other contexts may exist.

 Thanks.







 Thank you very much in advance,
 Pieter Vankeerberghen

 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

 _
 The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with 
 Hotmail.
 http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords

Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-10 Thread Leonard Rosenthol
There is no such thing as canonical PDF - anything that complies with the PDF 
specification is valid.  That allows for various uses of compression, ASCII 
encoding, etc.  

There are certainly tools out there that will uncompress/defilter all the 
elements in the PDF so that it is plain text and can be searched using 
text-only tools - though certainly that wouldn't help you for modifications 
(for obvious reasons).   

That's why library such as iText exist - to provide you with higher level APIs 
(where possible).  They are what one would use to create automated test tools, 
validators, etc.   And many such tools already do exist - so it's definitely 
doable (and has been done).

And let us not forget the expression - just because you only have a hammer, 
doesn't mean everything is a nail!

Leonard

-Original Message-
From: Mike Marchywka [mailto:marchy...@hotmail.com] 
Sent: Monday, May 10, 2010 6:02 PM
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] how to detect remote links in a PDF ?








 From: lrose...@adobe.com
 To: itext-questions@lists.sourceforge.net
 Date: Mon, 10 May 2010 06:44:13 -0700
 Subject: Re: [iText-questions] how to detect remote links in a PDF ?

 Prior to PDF 1.5, you could have done a grep (or equivalent) since only 
 stream objects were compressed. However, as of PDF 1.5, we now have object 
 streams, where groups of objects are placed into a stream and then 
 compressed - which means that grep will no longer work.

 Adobe Acrobat 9 will ALWAYS (unless restricted by a specific ISO standard, 
 such as PDF/A) use object stream compression to keep file sizes down. I've 
 been trying to recommend that other products do the same.


Is there some utility like in pdf tk to convert a pdf with arbitrary stuff in 
it to some Standard or 
canonical format that can let it be used with other tools so you don't have to 
write custom code for
every little trivail variation of a thing you wish to accopmlish? For example,

cat xxx.pdf | pdf_to_standard_form | grep http 


Obivously applicability would go beyond the immediate question but also let 
people writing itext
code have some way to check their results more easily than it opened in 
proprietary adobe product X
but in black box Y it greyed out 3 menu options and wouldn't let me save it 
unless blah blah bla ?

There is nothing wrong with a human readable end product but given the 
complexity of these things
it would be nice to use computers to automate certain things, like checking for 
links
or other attributes. Without ability to use automated tools everything comes 
down to a long
menu chain and terse messages from products not designed for debug.






 So while there certainly exists lots of PDFs that you could grep, the numbers 
 are reducing daily...

 Leonard

 -Original Message-
 From: Mike Marchywka [mailto:marchy...@hotmail.com]
 Sent: Monday, May 10, 2010 3:51 AM
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] how to detect remote links in a PDF ?











 
 Date: Sun, 9 May 2010 23:08:51 +0200
 From: papa...@googlemail.com
 To: itext-questions@lists.sourceforge.net
 Subject: [iText-questions] how to detect remote links in a PDF ?

 Colleagues,

 For an application, one needs to detect the hyperlinks (i.e. done with
 Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone
 point me to a solution ?

 Question for leonard or others who have read the spec, if you literally ONLY
 want to list the links, not parse the document or determine any context,
  are they likely to be hidden or can you just use text
 tools to find strings that start or contain http ? For example,


   540  cat *.pdf ../Desktop/*.pdf  | sed -e 's/[^a-ZA-Z0-9/:.?]/\n/g' | grep 
 http
   541  cat *.pdf ../Desktop/*.pdf  | strings | grep http
   542  history

 These seem to work in that they find things with http but not sure what would 
 be
 missing. Many of these seem to be surrounded by xml or prefixed with /A
 but not sure what other contexts may exist.

 Thanks.







 Thank you very much in advance,
 Pieter Vankeerberghen

 --

 ___
 iText-questions mailing list
 iText-questions@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/itext-questions

 Buy the iText book: http://www.itextpdf.com/book/
 Check the site with examples before you ask questions: 
 http://www.1t3xt.info/examples/
 You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

 _
 The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with 
 Hotmail.
 http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5

Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-10 Thread Mike Marchywka













 From: lrose...@adobe.com
 To: itext-questions@lists.sourceforge.net
 Date: Mon, 10 May 2010 18:09:15 -0700
 Subject: Re: [iText-questions] how to detect remote links in a PDF ?

 There is no such thing as canonical PDF - anything that complies with the 
 PDF specification is valid. That allows for various uses of compression, 
 ASCII encoding, etc.

 There are certainly tools out there that will uncompress/defilter all the 
 elements in the PDF so that it is plain text and can be searched using 
 text-only tools - though certainly that wouldn't help you for modifications 
 (for obvious reasons).

Well, not really. If there are rules for the PDF standard then you could in 
fact create some alternative representation- it could
be super big, verbose, complicated, etc but it may be a useful intermediate 
form for various types of work
such as debug or adhoc editing where you don't want to waste time writing 
custom code to do something
simple. XXX Intermediate Form is a very common file format :) I guess you 
could imagine expanding 
it to some XML format where you have decompressed the text and done something 
with the images, fonts, and formatting
information- no idea what. Essentially your claim is that PDF is so bizarre, 
unique, superlative,  and singular, nothing can possibly
equal it :) I just downloaded some schematic capture programs and those create 
documents that are inherently graphical-
schematics- but the essential features can be easily extracted as concise text 
netlists. 


 That's why library such as iText exist - to provide you with higher level 
 APIs (where possible). They are what one would use to create automated test 
 tools, validators, etc. And many such tools already do exist - so it's 
 definitely doable (and has been done).

If you took that attitude you couldn't even hide behind but pdf is a standard 
since then the argument is  well I have API
xyz and we can do anything with it. if you use my ABC format  I guess having a 
list would help, is there a pdf
developer download somewhere with tools like this? This reminds me of when I 
first got here and you explained
logical structure was available but everytimei it comes up in a concrete rather 
than hypothetical case
everyone says, Sure you could preserve strcuture but it is too copmlicated to 
be practical.  In the present
case, you say the tools exist but when someone shows up with an error from 
acrobat no one can point to a
tool to check the pdf. 



 And let us not forget the expression - just because you only have a hammer, 
 doesn't mean everything is a nail!

That's fine if you have a list of tools somewhere but I keep seeing the same 
hammer being used, usually
an Acrobate reader with the informative diagnostics your pdf is damaged. 
Again, I'm not saying this
is a fault with ADBE or pdf, but it would be nice to refer people to some list 
of tools that give a better
diagnostic. In many cases of course all you really care about is the text and 
the hammer gets almost everything
done. When you need the graphics that is a different situation. 

So ok I've only got one swiss army knife LOL.



 Leonard

 -Original Message-
 From: Mike Marchywka [mailto:marchy...@hotmail.com]
 Sent: Monday, May 10, 2010 6:02 PM
 To: itext-questions@lists.sourceforge.net
 Subject: Re: [iText-questions] how to detect remote links in a PDF ?







 
 From: lrose...@adobe.com
 To: itext-questions@lists.sourceforge.net
 Date: Mon, 10 May 2010 06:44:13 -0700
 Subject: Re: [iText-questions] how to detect remote links in a PDF ?

 Prior to PDF 1.5, you could have done a grep (or equivalent) since only 
 stream objects were compressed. However, as of PDF 1.5, we now have object 
 streams, where groups of objects are placed into a stream and then 
 compressed - which means that grep will no longer work.

 Adobe Acrobat 9 will ALWAYS (unless restricted by a specific ISO standard, 
 such as PDF/A) use object stream compression to keep file sizes down. I've 
 been trying to recommend that other products do the same.


 Is there some utility like in pdf tk to convert a pdf with arbitrary stuff in 
 it to some Standard or
 canonical format that can let it be used with other tools so you don't have 
 to write custom code for
 every little trivail variation of a thing you wish to accopmlish? For example,

 cat xxx.pdf | pdf_to_standard_form | grep http


 Obivously applicability would go beyond the immediate question but also let 
 people writing itext
 code have some way to check their results more easily than it opened in 
 proprietary adobe product X
 but in black box Y it greyed out 3 menu options and wouldn't let me save it 
 unless blah blah bla ?

 There is nothing wrong with a human readable end product but given the 
 complexity of these things
 it would be nice to use computers to automate certain things, like checking 
 for links
 or other

Re: [iText-questions] how to detect remote links in a PDF ?

2010-05-10 Thread Leonard Rosenthol
 There is no such thing as canonical PDF - anything that complies with the 
 PDF specification is valid. That allows for various uses of compression, 
 ASCII encoding, etc.

Well, not really. If there are rules for the PDF standard then you could in 
fact create some alternative representation- it could
be super big, verbose, complicated, etc but it may be a useful intermediate 
form for various types of work
such as debug or adhoc editing where you don't want to waste time writing 
custom code to do something simple. 

No argument!   

BUT an intermediate format (or an alternative format) and a canonical 
format are VERY VERY different things...

There are many folks who have developed alternative representations of PDF, 
whether in XML or other formats, including Adobe ourselves.  For example, Adobe 
has a project codenamed Mars on our Labs site 
(http://labs.adobe.com/wiki/index.php/Mars) which describes an XML+ZIP-based 
representation of PDF.  It supports all of the features of PDF from PDF 1.7.  
We provide some tooling for Acrobat  Reader, and you are welcome to develop 
your own. 

But again, that's NOT canonical - just alternative.


 That's why library such as iText exist - to provide you with higher level 
 APIs (where possible). They are what one would use to create 
 automated test tools, validators, etc. And many such tools already do exist 
 - so it's definitely doable (and has been done).

If you took that attitude you couldn't even hide behind but pdf is a 
standard since then the argument is  well I have API
xyz and we can do anything with it. if you use my ABC format  I guess having 
a list would help, is there a pdf
developer download somewhere with tools like this? 

Adobe Acrobat Professional includes a PDF validator feature as part of its 
Preflight module, and has since version 7.  It is the only publicly available 
validator that I am aware of, though I have spoken to at least a half-dozen 
commercial PDF vendors that have told me that they have developed their own 
validators for their own use.

There used to be two limited open source validators - JHOVE 
(http://hul.harvard.edu/jhove/pdf-hul.html) and Multivalent 
(http://multivalent.sourceforge.net/Tools/pdf/Validate.html).   But to my 
knowledge, neither is currently supported/updated.   Since both were Java-based 
OSS, I would think you could pick them up and run with them if you wished.


Leonard


--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/


[iText-questions] how to detect remote links in a PDF ?

2010-05-09 Thread pieter vankeerberghen
Colleagues,

For an application, one needs to detect the hyperlinks (i.e. done with 
Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone 
point me to a solution ?

Thank you very much in advance,
Pieter Vankeerberghen

--

___
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/