RE: [iText-questions] Reading and Extracting Text from PDF

Richard Braman Wed, 15 Feb 2006 11:27:08 -0800

I am a little confused by this snippet.  Where are you getting the steam
from? I know it's a PR Stream, but which contructor do you use to create
the PRStream?  There must be some code before this.


-----Original Message-----
From: Bruno Lowagie [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, February 15, 2006 2:46 AM
To: [EMAIL PROTECTED]
Cc: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] Reading and Extracting Text from PDF


Richard Braman wrote:

>My guess is that
>there is a way to interate thorugh the dictionary and get what I want 
>(like Bruno showed me how to do with the AcroForm.Fields), any code to 
>do that would be great.
>
You need the class PRTokeniser to do this,
I don't recommend it because it won't work for
most PDF files, but glancing at the content stream
you have posted, this might work (note that you
shouldn't expect this code to work for every PDF)

byte[] streamBytes = PdfReader.getStreamBytes(stream); 
PRTokeniser tokenizer = new PRTokeniser(streamBytes); 
while (tokenizer.nextToken()) {
  if (tokenizer.getTokenType() == PRTokeniser.TK_STRING) {
    System.out.println(tokenizer.getStringValue());
  }
}

br,
Bruno



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

RE: [iText-questions] Reading and Extracting Text from PDF

Reply via email to