Re: [iText-questions] how to detect remote links in a PDF ?
From: lrose...@adobe.com To: itext-questions@lists.sourceforge.net Date: Mon, 10 May 2010 19:01:15 -0700 Subject: Re: [iText-questions] how to detect remote links in a PDF ? There is no such thing as canonical PDF - anything that complies with the PDF specification is valid. That allows for various uses ofcompression, ASCII encoding, etc. Well, not really. If there are rules for the PDF standard then you could in fact create some alternative representation- it could be super big, verbose, complicated, etc but it may be a useful intermediate form for various types of work such as debug or adhoc editing where you don't want to waste time writing custom code to do something simple. No argument! BUT an intermediate format (or an alternative format) and a canonical format are VERY VERY different things... Well, at least canonical would be something like pdf that doesn't do anything fancy and has rules for otherwise arbitary choices then you could do simple things like ASCII searches and maybe binary diffs to test for pixel equality etc. There are many folks who have developed alternative representations of PDF, whether in XML or other formats, including Adobe ourselves. For example, Adobe has a project codenamed Mars on our Labs site () which describes an XML+ZIP-based representation of PDF. It supports all of the features of PDF from PDF 1.7. We provide some tooling for Acrobat Reader, and you are welcome to develop your own. But again, that's NOT canonical - just alternative. But that would work for the original purpose too. Maybe you should mention these on itext somewhere and refer people to them. It is hard to say you wil be accused of being biases any more than you already are and if the tools work who cares if you are biased? LOL . From your terse descriptions, that even sounds like a sane and workable approach, not what I would have expected ( sorry, had to interject LOL). This is also not irrelevant to itext implementation as a prior thread was talking about optimizations at an algorithm level If you had some attributes of a parsed or intermediate form that make various manipulations easy, it may be a good thing for itext to parse into or even write out for other canned ( itext based or not ) tools to use. cat pdf | itext_parse_to_intmediate_form | my_itext_tool | intermediate_to_pdf -O3 new.pdf Piping can be slow but obviously you can start mashing tools together etc. That's why library such as iText exist - to provide you with higher level APIs (where possible). They are what one would use to create automated test tools, validators, etc. And many such tools already do exist - so it's definitely doable (and has been done). If you took that attitude you couldn't even hide behind but pdf is a standard since then the argument is well I have API xyz and we can do anything with it. if you use my ABC format I guess having a list would help, is there a pdf developer download somewhere with tools like this? Adobe Acrobat Professional includes a PDF validator feature as part of its Preflight module, and has since version 7. It is the only publicly available validator that I am aware of, though I have spoken to at least a half-dozen commercial PDF vendors that have told me that they have developed their own validators for their own use. There used to be two limited open source validators - JHOVE () and Multivalent (). But to my knowledge, neither is currently supported/updated. Since both were Java-based OSS, I would think you could pick them up and run with them if you wished. Ok, sounds like reasonable starting points. I'm not saying it is trivial to do any of this, but it does seem much of the traffic here never gets referred to any simple diagnostics. Leonard -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask
Re: [iText-questions] how to detect remote links in a PDF ?
Well, at least canonical would be something like pdf that doesn't do anything fancy and has rules for otherwise arbitary choices then you could do simple things like ASCII searches and maybe binary diffs to test for pixel equality etc. While the ability to do ASCII searches and maybe binary diffs are goals that you desire in a file format - neither were goals that PDF had (or has) and so the design for it doesn't take either into account. If you think those should be underlying goals for PDF 2.0 (ISO 32000-2) - NOW IS THE TIME FOR YOU TO GET INVOLVED! We are still working on the next version of PDF at the ISO. ANYONE can get involved on the committee, at NO COST. Just contact your countries standards body and volunteer. (if you don't know who that is, let me know what country you reside in and I will be happy to provide a name email for you). PDF is a fully open standard. ANYONE can contribute. We WELCOME participation. BUT if you choose not to get involved, and we don't do the things you want, then you're complaints will fall on deaf ears... Leonard -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] how to detect remote links in a PDF ?
Leonard Rosenthol wrote: PDF is a fully open standard. ANYONE can contribute. We WELCOME participation. I'm interested! I'm going to the Adobe Pulse next week: http://events.adobe.co.uk/cgi-bin/register.cgi?country=ukeventid=9660venueid=9813 Is there somebody I can talk to about this? best regards, Bruno -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] how to detect remote links in a PDF ?
Yes, I believe that there should be a gentleman there named Colin that can help you... Leonard -Original Message- From: Bruno Lowagie [mailto:br...@lowagie.com] Sent: Tuesday, May 11, 2010 8:06 AM To: Post all your questions about iText here Subject: Re: [iText-questions] how to detect remote links in a PDF ? Leonard Rosenthol wrote: PDF is a fully open standard. ANYONE can contribute. We WELCOME participation. I'm interested! I'm going to the Adobe Pulse next week: http://events.adobe.co.uk/cgi-bin/register.cgi?country=ukeventid=9660venueid=9813 Is there somebody I can talk to about this? best regards, Bruno -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] how to detect remote links in a PDF ?
pieter vankeerberghen wrote: Colleagues, For an application, one needs to detect the hyperlinks (i.e. done with Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone point me to a solution ? Links are annotations, annotations are referred to in the page dictionary of every page. So you need to loop over all the page in an existing PDF and use getPageN(n) to get the page dictionary. Get the Annots entry from this dictionary and loop over the annotations. Check if there are annotations with Subtype Link. Inspect those annotations. Note that this is only one way to link to an external page. There could be links using other annotations too (e.g. involving a JavaScript action). -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] how to detect remote links in a PDF ?
Date: Sun, 9 May 2010 23:08:51 +0200 From: papa...@googlemail.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] how to detect remote links in a PDF ? Colleagues, For an application, one needs to detect the hyperlinks (i.e. done with Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone point me to a solution ? Question for leonard or others who have read the spec, if you literally ONLY want to list the links, not parse the document or determine any context, are they likely to be hidden or can you just use text tools to find strings that start or contain http ? For example, 540 cat *.pdf ../Desktop/*.pdf | sed -e 's/[^a-ZA-Z0-9/:.?]/\n/g' | grep http 541 cat *.pdf ../Desktop/*.pdf | strings | grep http 542 history These seem to work in that they find things with http but not sure what would be missing. Many of these seem to be surrounded by xml or prefixed with /A but not sure what other contexts may exist. Thanks. Thank you very much in advance, Pieter Vankeerberghen -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] how to detect remote links in a PDF ?
Prior to PDF 1.5, you could have done a grep (or equivalent) since only stream objects were compressed. However, as of PDF 1.5, we now have object streams, where groups of objects are placed into a stream and then compressed - which means that grep will no longer work. Adobe Acrobat 9 will ALWAYS (unless restricted by a specific ISO standard, such as PDF/A) use object stream compression to keep file sizes down. I've been trying to recommend that other products do the same. So while there certainly exists lots of PDFs that you could grep, the numbers are reducing daily... Leonard -Original Message- From: Mike Marchywka [mailto:marchy...@hotmail.com] Sent: Monday, May 10, 2010 3:51 AM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] how to detect remote links in a PDF ? Date: Sun, 9 May 2010 23:08:51 +0200 From: papa...@googlemail.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] how to detect remote links in a PDF ? Colleagues, For an application, one needs to detect the hyperlinks (i.e. done with Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone point me to a solution ? Question for leonard or others who have read the spec, if you literally ONLY want to list the links, not parse the document or determine any context, are they likely to be hidden or can you just use text tools to find strings that start or contain http ? For example, 540 cat *.pdf ../Desktop/*.pdf | sed -e 's/[^a-ZA-Z0-9/:.?]/\n/g' | grep http 541 cat *.pdf ../Desktop/*.pdf | strings | grep http 542 history These seem to work in that they find things with http but not sure what would be missing. Many of these seem to be surrounded by xml or prefixed with /A but not sure what other contexts may exist. Thanks. Thank you very much in advance, Pieter Vankeerberghen -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
Re: [iText-questions] how to detect remote links in a PDF ?
From: lrose...@adobe.com To: itext-questions@lists.sourceforge.net Date: Mon, 10 May 2010 06:44:13 -0700 Subject: Re: [iText-questions] how to detect remote links in a PDF ? Prior to PDF 1.5, you could have done a grep (or equivalent) since only stream objects were compressed. However, as of PDF 1.5, we now have object streams, where groups of objects are placed into a stream and then compressed - which means that grep will no longer work. Adobe Acrobat 9 will ALWAYS (unless restricted by a specific ISO standard, such as PDF/A) use object stream compression to keep file sizes down. I've been trying to recommend that other products do the same. Is there some utility like in pdf tk to convert a pdf with arbitrary stuff in it to some Standard or canonical format that can let it be used with other tools so you don't have to write custom code for every little trivail variation of a thing you wish to accopmlish? For example, cat xxx.pdf | pdf_to_standard_form | grep http Obivously applicability would go beyond the immediate question but also let people writing itext code have some way to check their results more easily than it opened in proprietary adobe product X but in black box Y it greyed out 3 menu options and wouldn't let me save it unless blah blah bla ? There is nothing wrong with a human readable end product but given the complexity of these things it would be nice to use computers to automate certain things, like checking for links or other attributes. Without ability to use automated tools everything comes down to a long menu chain and terse messages from products not designed for debug. So while there certainly exists lots of PDFs that you could grep, the numbers are reducing daily... Leonard -Original Message- From: Mike Marchywka [mailto:marchy...@hotmail.com] Sent: Monday, May 10, 2010 3:51 AM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] how to detect remote links in a PDF ? Date: Sun, 9 May 2010 23:08:51 +0200 From: papa...@googlemail.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] how to detect remote links in a PDF ? Colleagues, For an application, one needs to detect the hyperlinks (i.e. done with Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone point me to a solution ? Question for leonard or others who have read the spec, if you literally ONLY want to list the links, not parse the document or determine any context, are they likely to be hidden or can you just use text tools to find strings that start or contain http ? For example, 540 cat *.pdf ../Desktop/*.pdf | sed -e 's/[^a-ZA-Z0-9/:.?]/\n/g' | grep http 541 cat *.pdf ../Desktop/*.pdf | strings | grep http 542 history These seem to work in that they find things with http but not sure what would be missing. Many of these seem to be surrounded by xml or prefixed with /A but not sure what other contexts may exist. Thanks. Thank you very much in advance, Pieter Vankeerberghen -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords
Re: [iText-questions] how to detect remote links in a PDF ?
There is no such thing as canonical PDF - anything that complies with the PDF specification is valid. That allows for various uses of compression, ASCII encoding, etc. There are certainly tools out there that will uncompress/defilter all the elements in the PDF so that it is plain text and can be searched using text-only tools - though certainly that wouldn't help you for modifications (for obvious reasons). That's why library such as iText exist - to provide you with higher level APIs (where possible). They are what one would use to create automated test tools, validators, etc. And many such tools already do exist - so it's definitely doable (and has been done). And let us not forget the expression - just because you only have a hammer, doesn't mean everything is a nail! Leonard -Original Message- From: Mike Marchywka [mailto:marchy...@hotmail.com] Sent: Monday, May 10, 2010 6:02 PM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] how to detect remote links in a PDF ? From: lrose...@adobe.com To: itext-questions@lists.sourceforge.net Date: Mon, 10 May 2010 06:44:13 -0700 Subject: Re: [iText-questions] how to detect remote links in a PDF ? Prior to PDF 1.5, you could have done a grep (or equivalent) since only stream objects were compressed. However, as of PDF 1.5, we now have object streams, where groups of objects are placed into a stream and then compressed - which means that grep will no longer work. Adobe Acrobat 9 will ALWAYS (unless restricted by a specific ISO standard, such as PDF/A) use object stream compression to keep file sizes down. I've been trying to recommend that other products do the same. Is there some utility like in pdf tk to convert a pdf with arbitrary stuff in it to some Standard or canonical format that can let it be used with other tools so you don't have to write custom code for every little trivail variation of a thing you wish to accopmlish? For example, cat xxx.pdf | pdf_to_standard_form | grep http Obivously applicability would go beyond the immediate question but also let people writing itext code have some way to check their results more easily than it opened in proprietary adobe product X but in black box Y it greyed out 3 menu options and wouldn't let me save it unless blah blah bla ? There is nothing wrong with a human readable end product but given the complexity of these things it would be nice to use computers to automate certain things, like checking for links or other attributes. Without ability to use automated tools everything comes down to a long menu chain and terse messages from products not designed for debug. So while there certainly exists lots of PDFs that you could grep, the numbers are reducing daily... Leonard -Original Message- From: Mike Marchywka [mailto:marchy...@hotmail.com] Sent: Monday, May 10, 2010 3:51 AM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] how to detect remote links in a PDF ? Date: Sun, 9 May 2010 23:08:51 +0200 From: papa...@googlemail.com To: itext-questions@lists.sourceforge.net Subject: [iText-questions] how to detect remote links in a PDF ? Colleagues, For an application, one needs to detect the hyperlinks (i.e. done with Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone point me to a solution ? Question for leonard or others who have read the spec, if you literally ONLY want to list the links, not parse the document or determine any context, are they likely to be hidden or can you just use text tools to find strings that start or contain http ? For example, 540 cat *.pdf ../Desktop/*.pdf | sed -e 's/[^a-ZA-Z0-9/:.?]/\n/g' | grep http 541 cat *.pdf ../Desktop/*.pdf | strings | grep http 542 history These seem to work in that they find things with http but not sure what would be missing. Many of these seem to be surrounded by xml or prefixed with /A but not sure what other contexts may exist. Thanks. Thank you very much in advance, Pieter Vankeerberghen -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _ The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multicalendarocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
Re: [iText-questions] how to detect remote links in a PDF ?
From: lrose...@adobe.com To: itext-questions@lists.sourceforge.net Date: Mon, 10 May 2010 18:09:15 -0700 Subject: Re: [iText-questions] how to detect remote links in a PDF ? There is no such thing as canonical PDF - anything that complies with the PDF specification is valid. That allows for various uses of compression, ASCII encoding, etc. There are certainly tools out there that will uncompress/defilter all the elements in the PDF so that it is plain text and can be searched using text-only tools - though certainly that wouldn't help you for modifications (for obvious reasons). Well, not really. If there are rules for the PDF standard then you could in fact create some alternative representation- it could be super big, verbose, complicated, etc but it may be a useful intermediate form for various types of work such as debug or adhoc editing where you don't want to waste time writing custom code to do something simple. XXX Intermediate Form is a very common file format :) I guess you could imagine expanding it to some XML format where you have decompressed the text and done something with the images, fonts, and formatting information- no idea what. Essentially your claim is that PDF is so bizarre, unique, superlative, and singular, nothing can possibly equal it :) I just downloaded some schematic capture programs and those create documents that are inherently graphical- schematics- but the essential features can be easily extracted as concise text netlists. That's why library such as iText exist - to provide you with higher level APIs (where possible). They are what one would use to create automated test tools, validators, etc. And many such tools already do exist - so it's definitely doable (and has been done). If you took that attitude you couldn't even hide behind but pdf is a standard since then the argument is well I have API xyz and we can do anything with it. if you use my ABC format I guess having a list would help, is there a pdf developer download somewhere with tools like this? This reminds me of when I first got here and you explained logical structure was available but everytimei it comes up in a concrete rather than hypothetical case everyone says, Sure you could preserve strcuture but it is too copmlicated to be practical. In the present case, you say the tools exist but when someone shows up with an error from acrobat no one can point to a tool to check the pdf. And let us not forget the expression - just because you only have a hammer, doesn't mean everything is a nail! That's fine if you have a list of tools somewhere but I keep seeing the same hammer being used, usually an Acrobate reader with the informative diagnostics your pdf is damaged. Again, I'm not saying this is a fault with ADBE or pdf, but it would be nice to refer people to some list of tools that give a better diagnostic. In many cases of course all you really care about is the text and the hammer gets almost everything done. When you need the graphics that is a different situation. So ok I've only got one swiss army knife LOL. Leonard -Original Message- From: Mike Marchywka [mailto:marchy...@hotmail.com] Sent: Monday, May 10, 2010 6:02 PM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] how to detect remote links in a PDF ? From: lrose...@adobe.com To: itext-questions@lists.sourceforge.net Date: Mon, 10 May 2010 06:44:13 -0700 Subject: Re: [iText-questions] how to detect remote links in a PDF ? Prior to PDF 1.5, you could have done a grep (or equivalent) since only stream objects were compressed. However, as of PDF 1.5, we now have object streams, where groups of objects are placed into a stream and then compressed - which means that grep will no longer work. Adobe Acrobat 9 will ALWAYS (unless restricted by a specific ISO standard, such as PDF/A) use object stream compression to keep file sizes down. I've been trying to recommend that other products do the same. Is there some utility like in pdf tk to convert a pdf with arbitrary stuff in it to some Standard or canonical format that can let it be used with other tools so you don't have to write custom code for every little trivail variation of a thing you wish to accopmlish? For example, cat xxx.pdf | pdf_to_standard_form | grep http Obivously applicability would go beyond the immediate question but also let people writing itext code have some way to check their results more easily than it opened in proprietary adobe product X but in black box Y it greyed out 3 menu options and wouldn't let me save it unless blah blah bla ? There is nothing wrong with a human readable end product but given the complexity of these things it would be nice to use computers to automate certain things, like checking for links or other
Re: [iText-questions] how to detect remote links in a PDF ?
There is no such thing as canonical PDF - anything that complies with the PDF specification is valid. That allows for various uses of compression, ASCII encoding, etc. Well, not really. If there are rules for the PDF standard then you could in fact create some alternative representation- it could be super big, verbose, complicated, etc but it may be a useful intermediate form for various types of work such as debug or adhoc editing where you don't want to waste time writing custom code to do something simple. No argument! BUT an intermediate format (or an alternative format) and a canonical format are VERY VERY different things... There are many folks who have developed alternative representations of PDF, whether in XML or other formats, including Adobe ourselves. For example, Adobe has a project codenamed Mars on our Labs site (http://labs.adobe.com/wiki/index.php/Mars) which describes an XML+ZIP-based representation of PDF. It supports all of the features of PDF from PDF 1.7. We provide some tooling for Acrobat Reader, and you are welcome to develop your own. But again, that's NOT canonical - just alternative. That's why library such as iText exist - to provide you with higher level APIs (where possible). They are what one would use to create automated test tools, validators, etc. And many such tools already do exist - so it's definitely doable (and has been done). If you took that attitude you couldn't even hide behind but pdf is a standard since then the argument is well I have API xyz and we can do anything with it. if you use my ABC format I guess having a list would help, is there a pdf developer download somewhere with tools like this? Adobe Acrobat Professional includes a PDF validator feature as part of its Preflight module, and has since version 7. It is the only publicly available validator that I am aware of, though I have spoken to at least a half-dozen commercial PDF vendors that have told me that they have developed their own validators for their own use. There used to be two limited open source validators - JHOVE (http://hul.harvard.edu/jhove/pdf-hul.html) and Multivalent (http://multivalent.sourceforge.net/Tools/pdf/Validate.html). But to my knowledge, neither is currently supported/updated. Since both were Java-based OSS, I would think you could pick them up and run with them if you wished. Leonard -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
[iText-questions] how to detect remote links in a PDF ?
Colleagues, For an application, one needs to detect the hyperlinks (i.e. done with Chunk.setRemoteGoto) in a PDF which point to an other PDF, can someone point me to a solution ? Thank you very much in advance, Pieter Vankeerberghen -- ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/