---------------------------------------- > From: > To: itext-questions@lists.sourceforge.net > Date: Tue, 26 May 2009 07:57:52 -0700 > Subject: Re: [iText-questions] How can compare the content of two revision > > Clearly, you haven't read the PDF Reference/ISO 32000-1 in order to > understand PDF and all that it contains if you believe that your presented > code is, in any way, a valid way to compare documents...
Having exchanged comments on this topic before, the OP can see my prior posts on the extraction of various features that may be comparable. Failing that, if you can render the text it is easy to subtract 2 BMP files and examine the difference. It depends what you are looking for. I'm trying to extract machine readable information generally, so I care about extracting text and maybe URL's while fonts I wish I didn't have to download ( pdf2text for example doesn't care). So if you can figure out what you want to measure and put that quantity in terms of attributes of the PDF file, you stand a chance at writing code. > > Leonard > > -----Original Message----- > From: OscarP > Sent: Tuesday, May 26, 2009 10:30 AM > To: itext-questions@lists.sourceforge.net > Subject: Re: [iText-questions] How can compare the content of two revision > > > Hi, > > Ok Michael, i was able to get the PDF contents, but my method doesn't work > for all PDF files. For instance, I can't get it to work with PDF files > generated with OpenOffice. > > import java.io.*; > import java.util.*; > > import com.lowagie.text.*; > import com.lowagie.text.pdf.*; > > public class Example { > public static void main(String[] args) { > comprobar("d:\\pruebas\\PDFs\\textoF2I.pdf"); > } > > public static void comprobar(String fichero) { > System.out.println("/////////////////////////////////////"); > System.out.println(fichero); > System.out.println("/////////////////////////////////////"); > try { > PdfReader reader1 = new PdfReader(fichero); > System.out.println(obtenerPaginaPDF(reader1,1)); > }catch(Exception e){ > e.printStackTrace(); > } > } > > public static String obtenerPaginaPDF (PdfReader reader,int i){ > try{ > PdfDictionary page = reader.getPageN(i); > byte[] streamBytes = getStreamBytes(page); > PRTokeniser tokenizer = new PRTokeniser(streamBytes); > StringBuffer sb = new StringBuffer(); > boolean arrayAbierto = false; > while (tokenizer.nextToken()) { > if (tokenizer.getTokenType() == PRTokeniser.TK_STRING) { > if (tokenizer.getStringValue().equals(" ") && !arrayAbierto) > sb.append("\n"); > else > sb.append(tokenizer.getStringValue()); > } > else if (tokenizer.getTokenType() == PRTokeniser.TK_START_ARRAY) { > arrayAbierto=true; > } > else if (tokenizer.getTokenType() == PRTokeniser.TK_END_ARRAY) { > arrayAbierto=false; > sb.append("\n"); > } > } > return sb.toString(); > } catch (IOException e) { > // TODO Bloque catch generado automáticamente > e.printStackTrace(); > } > return null; > > } > > private static byte[] getStreamBytes(PdfDictionary page) throws > IOException{ > PdfObject resources = page.get(PdfName.RESOURCES); > > byte[] streamBytes=null; > if (resources instanceof PdfDictionary){ > try{ > PdfDictionary object = (PdfDictionary) > ((PdfDictionary)resources).get(PdfName.XOBJECT); > if (object!=null){ > Set set = object.getKeys(); > Iterator it = set.iterator(); > while (it.hasNext()){ > PdfName s = (PdfName) it.next(); > if (object.get(s) instanceof PRIndirectReference){ > PRIndirectReference objectReference = (PRIndirectReference) > object.get(s); > PRStream stream = (PRStream) PdfReader > .getPdfObject(objectReference); > streamBytes = PdfReader.getStreamBytes(stream); > } > } > } > }catch(Exception e){ > e.printStackTrace(); > } > } > else if (resources instanceof PRIndirectReference){ > try{ > PdfDictionary object = (PdfDictionary)PdfReader.getPdfObject(resources); > if (object!=null){ > Set set = object.getKeys(); > Iterator it = set.iterator(); > while (it.hasNext()){ > PdfName s = (PdfName) it.next(); > if (object.get(s) instanceof PRIndirectReference){ > PRIndirectReference objectReference = (PRIndirectReference) > object.get(s); > PRStream stream = (PRStream) PdfReader > .getPdfObject(objectReference); > streamBytes = PdfReader.getStreamBytes(stream); > } > } > } > }catch(Exception e){ > } > } > if (streamBytes==null){ > PdfObject ob = page.get(PdfName.CONTENTS); > if (ob instanceof PRIndirectReference){ > PRIndirectReference contents = (PRIndirectReference) > page.get(PdfName.CONTENTS); > PRStream streamContents = (PRStream) PdfReader.getPdfObject(contents); > streamBytes = PdfReader.getStreamBytes(streamContents); > } > else if (ob instanceof PdfArray){ > for (int j=0;j PRIndirectReference ir = > (PRIndirectReference)((PdfArray)ob).getPdfObject(j); > PRStream streamContents = (PRStream) PdfReader.getPdfObject(ir); > streamBytes = PdfReader.getStreamBytes(streamContents); > } > } > } > return streamBytes; > } > } > > Probably this is not the right way to get the PDF contents, but I see no > other way to do it, and I don't know what else I can try. > > I had execute this code with this files: > - Generate with Acrobat Profesional > http://www.nabble.com/file/p23723941/firmado2vecesOk.pdf firmado2vecesOk.pdf > . > - Generate with GosthScript > http://www.nabble.com/file/p23723941/2274_2007_H_PROVISIONAL.pdf > 2274_2007_H_PROVISIONAL.pdf . > - Generate with MSWord > http://www.nabble.com/file/p23723941/Security%2BArchitecture.pdf > Security+Architecture.pdf > - Generate with OpenOffice > http://www.nabble.com/file/p23723941/Prueba-para-Oscar.pdf > Prueba-para-Oscar.pdf > > All the examples work "fine", i haven't tested them with embedded images, > except the OpenOffice one. > > Could you please show me an example on how to do this? Could you at least > tell me what is going wrong? > > > Thank you very much in advance. > > > > mkl wrote: >> >> Oscar, >> >> >> OscarP wrote: >>> >>> OK, >>> took several days working on this, but I can not find out anything, how >>> can I get those differences? I've analysed the binary of this document >>> http://www.nabble.com/file/p23704652/textoF2IMod.pdf textoF2IMod.pdf , >>> but the object with the difference (70 0) returns null with the itext >>> (reader.refObj[70]). >>> >> >> 70 0 contains a cross-reference stream. iText hides away cross-reference >> streams it comes along when collecting cross-reference information by >> explicitely marking the matching entry in memory as a freed object. ( "if >> (thisStream < xref.length) xref[thisStream] = -1;" in >> PdfReader.readXRefStream) >> >> (Actually 70 0 is the cross reference stream holding only the information >> about object 70 0...) >> >> The rationale for this might be some self protection; usually you never >> tamper with any former cross-reference tables or streams. When trying to >> inspect a PDF in detail this is a bit uncomfortable, though. >> >> >> OscarP wrote: >>> >>> To sum it all up, I need to know whether there are differences between >>> one signature and the other. I'd be very grateful if you could tell me >>> the way to get that result with iText. >>> >> >> Whether there are differences between the signatures? You refer to the >> signature containers or the whole signature dictionaries? Either way, they >> are directly available from the AcroFields, aren't they? >> >> Regards, Michael. >> > > -- > View this message in context: > http://www.nabble.com/How-can-compare-the-content-of-two-revision-tp23649348p23723941.html > Sent from the iText - General mailing list archive at Nabble.com. > > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity professionals. Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian > Group, R/GA, & Big Spaceship. http://www.creativitycat.com > _______________________________________________ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.1t3xt.com/docs/book.php > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity professionals. Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian > Group, R/GA, & Big Spaceship. http://www.creativitycat.com > _______________________________________________ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.1t3xt.com/docs/book.php > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ _________________________________________________________________ Insert movie times and more without leaving Hotmail®. http://windowslive.com/Tutorial/Hotmail/QuickAdd?ocid=TXT_TAGLM_WL_HM_Tutorial_QuickAdd1_052009 ------------------------------------------------------------------------------ Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://www.creativitycat.com _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/