I'm surprised that pstotext fails, iText certainly doesn't generate anything
strange. If the documents were generated by you with iText, with winansi
font encoding, you can read the document with iText, open the stream and
parse the text. To parse the text find the the first '(' and read until the
next ')', escaping the '\'.

Best Regards,
Paulo Soares

> -----Original Message-----
> From: Matt Benson [SMTP:[EMAIL PROTECTED]]
> Sent: Tuesday, November 26, 2002 17:52
> To:   Paulo Soares; itext-questions
> Subject:      RE: [iText-questions] PDF metadata
> 
> Thanks for the suggestion, Paulo.  Now, what would you
> use to do the extraction?  I have been playing with
> pstotext which uses gs behind the scenes, but gs is
> choking on the iText-generated PDF.  I have looked at
> jPedal, but not deeply as it seems to lack an
> intuitive enough API that I could get started quickly
> without reading the source code of the examples.
> 
> -Matt
> 
> --- Paulo Soares <[EMAIL PROTECTED]> wrote:
> > Taking out the metadata won't help you as there are
> > no guaranties that the
> > layout engine is the same from version to version,
> > the text may look the
> > same but the internal representation is different.
> > The best way is to do a
> > checksum to the text (words only, skipping the
> > whitespace) and store that
> > information in the pdf metadata as a new key. The
> > already generated pdf can
> > have the text extracted, the checksum calculated and
> > applied to the same
> > pdf.
> > 
> > Best Regards,
> > Paulo Soares
> > 
> > > -----Original Message-----
> > > From:     Matt Benson [SMTP:[EMAIL PROTECTED]]
> > > Sent:     Tuesday, November 26, 2002 15:40
> > > To:       itext-questions
> > > Subject:  [iText-questions] PDF metadata
> > > 
> > > We are using iText to convert text files to PDF as
> > > outlined in the FAQ.  This works; however I want
> > to
> > > take a checksum of the PDF created and use it in
> > > conjunction with some other information to verify
> > we
> > > have not created this file before.  What I am
> > finding,
> > > however, is that the metadata of the PDF always
> > > differs between iText versions as well as creation
> > > date/time, so I cannot create the exact same file
> > > twice and thus cannot rely on a checksum.  I could
> > use
> > > the checksum from the input file, except that this
> > is
> > > a modification to a production application and we
> > no
> > > longer have the input files for the existing data.
> >  So
> > > to do this I would have to extract the text to get
> > an
> > > approximation of the original file.  If I did
> > this,
> > > the checksum would represent slightly different
> > things
> > > from the old to the new data.  What I am wondering
> > > about is whether these variable pieces of metadata
> > are
> > > vital to the PDF structure, and if not, what would
> > it
> > > take to remove them?  Alternatively, if anyone has
> > a
> > > better idea then those are welcome too.
> > > 
> > > Thanks,
> > > Matt
> > > 
> > > __________________________________________________
> > > Do you Yahoo!?
> > > Yahoo! Mail Plus - Powerful. Affordable. Sign up
> > now.
> > > http://mailplus.yahoo.com
> > > 
> > > 
> > >
> >
> -------------------------------------------------------
> > > This SF.net email is sponsored by: Get the new
> > Palm Tungsten T 
> > > handheld. Power & Color in a compact size! 
> > >
> >
> http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en
> > > _______________________________________________
> > > iText-questions mailing list
> > > [EMAIL PROTECTED]
> > >
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com


-------------------------------------------------------
This SF.net email is sponsored by: Get the new Palm Tungsten T 
handheld. Power & Color in a compact size! 
http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en
_______________________________________________
iText-questions mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Reply via email to