Prior to PDF 1.5, you could have done a grep (or equivalent) since only stream objects were compressed. However, as of PDF 1.5, we now have "object streams", where groups of objects are placed into a stream and then compressed - which means that grep will no longer work.
Adobe Acrobat 9 will ALWAYS (unless restricted by a specific ISO standard, such as PDF/A) use object stream compression to keep file sizes down. I've been trying to recommend that other products do the same. So while there certainly exists lots of PDFs that you could grep, the numbers are reducing daily... Leonard -----Original Message----- From: Mike Marchywka [mailto:marchy...@hotmail.com] Sent: Monday, May 10, 2010 3:51 AM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] how to detect remote links in a PDF ? ---------------------------------------- > Date: Sun, 9 May 2010 23:08:51 +0200 > From: papa...@googlemail.com > To: itext-questions@lists.sourceforge.net > Subject: [iText-questions] how to detect remote links in a PDF ? > > Colleagues, > > For an application, one needs to detect the hyperlinks (i.e. done with > Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone > point me to a solution ? Question for leonard or others who have read the spec, if you literally ONLY want to list the links, not parse the document or determine any context, are they likely to be hidden or can you just use text tools to find strings that start or contain "http" ? For example, 540 cat *.pdf ../Desktop/*.pdf | sed -e 's/[^a-ZA-Z0-9/:.?]/\n/g' | grep http 541 cat *.pdf ../Desktop/*.pdf | strings | grep http 542 history These seem to work in that they find things with http but not sure what would be missing. Many of these seem to be surrounded by xml or prefixed with "/A" but not sure what other contexts may exist. Thanks. > > Thank you very much in advance, > Pieter Vankeerberghen > > ------------------------------------------------------------------------------ > > _______________________________________________ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.itextpdf.com/book/ > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _________________________________________________________________ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 ------------------------------------------------------------------------------ _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ ------------------------------------------------------------------------------ _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/