----------------------------------------
> From: lrose...@adobe.com
> To: itext-questions@lists.sourceforge.net
> Date: Tue, 26 May 2009 08:56:22 -0700
> Subject: Re: [iText-questions] How can compare the content of two revision
>
> Comparison of "just text" isn't a good approach, since that certainly doesn't 
> compare any other aspects of the page (or the document). Also, your approach 
> to "text extraction" isn't really text it's just the raw operators from the 
> PDF. Without taking into account the font & encoding information - you don't 
> have text and thus aren't even comparing the right thing there...
>
> And you have a LOT of work ahead of you...


I have skipped details in the thread and still plowing
through my inbox but if you really WANT to compare
text, and that is all I normally need, then pdf2text would work
along with diff. Further, see the code I posted or grep itext
source for "matrix" and you can find ways to extract text with position
information. Often the font information is just a large distraction
but sometimes the document authors find it quite important.




>
> Leonard
>
> -----Original Message-----
> From: OscarP [mailto:opasc...@gmail.com]
> Sent: Tuesday, May 26, 2009 11:44 AM
> To: itext-questions@lists.sourceforge.net
> Subject: Re: [iText-questions] How can compare the content of two revision
>
>
> Hi Leonard,
>
>
> OscarP wrote:
>>
>> Probably this is not the right way to get the PDF contents, but I see no
>> other way to do it, and I don't
>> know what else I can try.
>>
>
> I know this. My code extracts only the text of one page and then compares
> this text with another page. If i Knew how to do it, i wouldn't ask for
> help here.
>
> But you are right haven't read the PDF Reference/ISO 32000-1. I am reading
> it now.
>
> Thanks anyway
>
>
> Leonard Rosenthol-3 wrote:
>>
>> Clearly, you haven't read the PDF Reference/ISO 32000-1 in order to
>> understand PDF and all that it contains if you believe that your presented
>> code is, in any way, a valid way to compare documents...
>>
>> Leonard


_________________________________________________________________
HotmailĀ® goes with you. 
http://windowslive.com/Tutorial/Hotmail/Mobile?ocid=TXT_TAGLM_WL_HM_Tutorial_Mobile1_052009
------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Reply via email to