Prior to PDF 1.5, you could have done a grep (or equivalent) since only stream 
objects were compressed.  However, as of PDF 1.5, we now have "object streams", 
where groups of objects are placed into a stream and then compressed - which 
means that grep will no longer work.

Adobe Acrobat 9 will ALWAYS (unless restricted by a specific ISO standard, such 
as PDF/A) use object stream compression to keep file sizes down.  I've been 
trying to recommend that other products do the same.

So while there certainly exists lots of PDFs that you could grep, the numbers 
are reducing daily...

Leonard

-----Original Message-----
From: Mike Marchywka [mailto:marchy...@hotmail.com] 
Sent: Monday, May 10, 2010 3:51 AM
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] how to detect remote links in a PDF ?











----------------------------------------
> Date: Sun, 9 May 2010 23:08:51 +0200
> From: papa...@googlemail.com
> To: itext-questions@lists.sourceforge.net
> Subject: [iText-questions] how to detect remote links in a PDF ?
>
> Colleagues,
>
> For an application, one needs to detect the hyperlinks (i.e. done with
> Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone
> point me to a solution ?

Question for leonard or others who have read the spec, if you literally ONLY
want to list the links, not parse the document or determine any context,
 are they likely to be hidden or can you just use text
tools to find strings that start or contain "http" ? For example,


  540  cat *.pdf ../Desktop/*.pdf  | sed -e 's/[^a-ZA-Z0-9/:.?]/\n/g' | grep 
http
  541  cat *.pdf ../Desktop/*.pdf  | strings | grep http
  542  history

These seem to work in that they find things with http but not sure what would be
missing. Many of these seem to be surrounded by xml or prefixed with "/A" 
but not sure what other contexts may exist.

Thanks.






>
> Thank you very much in advance,
> Pieter Vankeerberghen
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> Buy the iText book: http://www.itextpdf.com/book/
> Check the site with examples before you ask questions: 
> http://www.1t3xt.info/examples/
> You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
                                          
_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with 
Hotmail. 
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
------------------------------------------------------------------------------

_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

------------------------------------------------------------------------------

_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Reply via email to