Good question, Leonard.

If someone were to attempt to use the approach I recommended to extract text 
from a locked and encrypted PDF, rather than from the PS print file, he/she 
would be doomed to failure, as the regex engine that would be used in the 
match/substitution would be unable to find the target text fragments.

The context of the OP's question suggested he was trying to read text content 
from a PDF with a program. Naturally, the procedure I recommended requires the 
user to be able to open the PDF with Reader and print it to a PS file.

I don't understand your question about "read it to just read it" in the context 
of the OP's question. I think we're talking about text extraction. Please 
rephrase your question.

BTW, this issue was thoroughly discussed on comp.text.pdf about 4-5 years ago.

Cheers,
Bill Segraves
  ----- Original Message ----- 
  From: Leonard Rosenthol 
  To: Post all your questions about iText here 
  Sent: Sunday, October 07, 2007 6:57 PM
  Subject: Re: [iText-questions] u guys konw how to read the data from pdfusing 
java itext ?


  Why would using this approach fail on a PDF that is locked with a  
  password?  In order to print the PDF, you have to have the ability to  
  READ it ;).  And if you can read it in order to print it, you can  
  read it to just read it...Right?

  Leonard


  On Oct 7, 2007, at 11:05 AM, [EMAIL PROTECTED] wrote:

  > If the PDF is locked with a password, but still printable, the  
  > approach offered by this author is one that would work, while  
  > attempting to use this approach on the original PDF would fail.  
  > This author was simply trying to help the poster with an approach  
  > that would avoid the frustration that would ensue if he tried to  
  > work with an original locked PDF.
  >
  >
  > Of course, the approach espoused by the esteemed sage would be  
  > easier, for both unlocked and unlocked PDFs. OTOH, this author  
  > doesn't count easier to fail as an acceptable approach.
  >
  >
  > Cheers,
  >
  > Bill Segraves
  >
  > -------------- Original message from Leonard Rosenthol  
  > <[EMAIL PROTECTED]>: --------------
  >
  >
  > > Why would working through the PostScript be easier than doing  
  > this on
  > > the original PDF?
  > >
  > > You can get to all the PDF operators just fine.
  > > Font & text information is more easily referenceable from the PDF
  > > PostScript also has "XObjects", Patterns, etc. that may contain  
  > text.
  > > etc.
  > >
  > > Not understanding the logic :(.
  > >
  > > Leonard
  > >
  > >
  > > On Oct 6, 2007, at 4:53 PM, [EMAIL PROTECTED] wrote:
  > >
  > > > Yes; but it is not practicable with iText. You could, however, as
  > > > long as the PDF is printable, use the following procedure:
  > > >
  > > > 1. Print to a PS file.
  > > >
  > > > 2. Scan the PS file from step1 above, droppin g all lines that
  > > > do not end with Tj or TJ.
  > > >
  > > > 3. Use a regular expression (together with Substitution or
  > > > Match) to extract the instances of "text fragment" from within
  > > > multiple instances of "(text fragment)Tj", printing the resulting
  > > > text fragments to STDOUT.
  > > >
  > > > Bruno has given an excellent example of why you should not expect
  > > > the resulting output to make sense, i.e., the text fragments may
  > > > not appear in the order in which you'd like for them to appear.
  > > >
  > > > Cheers,
  > > >
  > > > Bill Segraves
  > > >
  > > > -------------- Original message from krammark
  > > > : --------------
  > > >
  > > >
  > > > >
  > > > > so , how we read the data from pdf ?
  > > > > i mean , can we read them line by line from the specific pages ?
  > > & gt; &g t;
  > > > > thanks buddy.
  > > > >
  > > > >
  > > > > Bruno Lowagie (iText) wrote:
  > > > > >
  > > > > > krammark wrote:
  > > > > >> hey gusy,
  > > > > >> do u guys have a idea how to read the data from pdf pages
  > > > using itext ?
  > > > > >> basically, i want to read the data from table and write them
  > > > into excel
  > > > > >> files.
  > > > > >> is that possible ?
  > > > > >
  > > > > > There is no such thing as 'a table' in plain PDF.
  > > > > > It's just lines and words painted on a canvas,
  > > > > > possible in an arbitrary order.
  > > > > >
  > > > > > Unless your tables cells are form fields, or your
  > > > > ; > PDF contains specific table structures (Tagged PDF),
  > > > > > iText probably won't help you.
  > & gt; > > >
  > > > > > br,
  > > > > > Bruno
  > > > > >
  > > > > >
  > > >  
  > ----------------------------------------------------------------------
  > > > ---
  > > > > > This SF.net email is sponsored by: Splunk Inc.
  > > > > > Still grepping through log files to find problems? Stop.
  > > > > > Now Search log events and configuration files using AJAX and a
  > > > browser.
  > > > > > Download your FREE copy of Splunk now >> http:// 
  > get.splunk.com/
  > > > > > _______________________________________________
  > > > > > iText-questions mailing list
  > > > > > [email protected]
  > > > > > https://lists.sourceforge.net/lists/listinfo/itext-questions
  > > > > > Buy the iText book: http://itext.ugent.be/itext-in-action/
  > > > > >
  > > > > >
  > > > >
  > > > > --
  > > > > View this message in context:
  > > > > http://www.nabble.com/u-guys-konw -how-t o-read-the-data-from- 
  > pdf-
  > > > using-java-itext
  > > > > ---tf4572506.html#a13067937
  > > > > Sent from the iText - General mailing list archive at  
  > Nabble.com.
  > > > >
  > > > >
  > > > >
  > > >  
  > ----------------------------------------------------------------------
  > > > ---
  > > > > This SF.net email is sponsored by: Splunk Inc.
  > > > > Still grepping through log files to find problems? Stop.
  > > > > Now Search log events and configuration files using AJAX and a
  > > > browser.
  > > > > Download your FREE copy of Splunk now >> http://get.splunk.com/
  > > > > _______________________________________________
  > > > > iText-questions mailing list
  > > > > [EMAIL PROTECTED] rge.ne t
  > > > > https://lists.sourceforge.net/lists/listinfo/itext-questions
  > > > > Buy the iText book: http://itext.ugent.be/itext-in-action/
  > > >  
  > ----------------------------------------------------------------------
  > > > ---
  > > > This SF.net email is sponsored by: Splunk Inc.
  > > > Still grepping through log files to find problems? Stop.
  > > > Now Search log events and configuration files using AJAX and a
  > > > browser.
  > > > Download your FREE copy of Splunk now >> http://get.splunk.com/
  > > > _______________________________________________
  > > > iText-questions mailing list
  > > > [email protected]
  > > > https://lists.sourceforge.net/lists/listinfo/itext-questions
  > > > Buy the iText book: http://itext.ugent.be/itext-in-action/
  > >
  > >
  > >  
  > ---------------------------------------------------------------------- 
  > ---
  > & gt; Th is SF.net email is sponsored by: Splunk Inc.
  > > Still grepping through log files to find problems? Stop.
  > > Now Search log events and configuration files using AJAX and a  
  > browser.
  > > Download your FREE copy of Splunk now >> http://get.splunk.com/
  > > _______________________________________________
  > > iText-questions mailing list
  > > [email protected]
  > > https://lists.sourceforge.net/lists/listinfo/itext-questions
  > > Buy the iText book: http://itext.ugent.be/itext-in-action/
  > ---------------------------------------------------------------------- 
  > ---
  > This SF.net email is sponsored by: Splunk Inc.
  > Still grepping through log files to find problems?  Stop.
  > Now Search log events and configuration files using AJAX and a  
  > browser.
  > Download your FREE copy of Splunk now >> http://get.splunk.com/ 
  > _______________________________________________
  > iText-questions mailing list
  > [email protected]
  > https://lists.sourceforge.net/lists/listinfo/itext-questions
  > Buy the iText book: http://itext.ugent.be/itext-in-action/


  -------------------------------------------------------------------------
  This SF.net email is sponsored by: Splunk Inc.
  Still grepping through log files to find problems?  Stop.
  Now Search log events and configuration files using AJAX and a browser.
  Download your FREE copy of Splunk now >> http://get.splunk.com/
  _______________________________________________
  iText-questions mailing list
  [email protected]
  https://lists.sourceforge.net/lists/listinfo/itext-questions
  Buy the iText book: http://itext.ugent.be/itext-in-action/
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/

Reply via email to