I believe "hidden text" in that context is text that has been drawn with Text Render Mode 3 (no stroke, no fill). It may include text that has been clipped as well.
In either case, iText will pick it up. I see a lot of "unknown character" characters in that screen shot. If Acrobat cannot successfully extract the text content, iText won't be able to either. Can we see the PDF? --Mark Storer Senior Software Engineer Cardiff.com import legalese.Disclaimer; Disclaimer<Cardiff> DisCard = null; > -----Original Message----- > From: Michael [mailto:[email protected]] > Sent: Wednesday, May 18, 2011 8:49 AM > To: [email protected] > Subject: [iText-questions] Optional context retrieval > > I am trying to retrieve text from the pdf file and I am > having difficulties with (I guess) optional content. I placed > the screenshot of the file opened in Adobe Pro here: > > http://imageshack.us/f/863/samplereport.jpg/ > > Adobe shows the "hidden" text and also successfully exports > the file to "text" > format. I am trying to use iTextSharp to do the same, but I > can't extract the text that is visible in Adobe. > > Any help/advise is greatly appreciated. > > internal class PageReader > { > public void readPage(String pagepath) > { > test1(pagepath);// returns non-askii result > test2(pagepath);// returns only the last line (I > guess it is "direct content. > } > > private void test1(String pagepath) > { > PdfReader reader = new PdfReader(pagepath); > String textres = ""; > for (int i = 1; i <= reader.NumberOfPages; ++i) > { > byte[] lastpage = reader.GetPageContent(1); > if (lastpage == null) > return; > > PRTokeniser tokenizer = new PRTokeniser(lastpage); > while (tokenizer.NextToken()) > if (tokenizer.TokenType == > PRTokeniser.TokType.STRING) > textres += tokenizer.StringValue; > } > } > > private void test2(String pagepath) > { > PdfReader reader = new PdfReader(pagepath); > // String str = > PdfTextExtractor.GetTextFromPage(reader, 1); > > PdfReaderContentParser parser = new > PdfReaderContentParser(reader); > TextExtractionStrategy strategy = new > TextExtractionStrategy(); > String str = ""; > for (int i = 1; i <= reader.NumberOfPages; ++i) > { > > parser.ProcessContent<TextExtractionStrategy>(i, strategy); > str += strategy.txt; > } > } > } > > internal class TextExtractionStrategy : IRenderListener > { > public void BeginTextBlock() { } > public void EndTextBlock() { } > public void RenderImage(ImageRenderInfo renderInfo) { } > public void RenderText(TextRenderInfo renderInfo) > { > _str += renderInfo.GetText(); > } > private String _str; > > public String txt { get { return _str; } } > } > > > > -------------------------------------------------------------- > ---------------- > What Every C/C++ and Fortran developer Should Know! > Read this article and learn how Intel has extended the reach > of its next-generation tools to help Windows* and Linux* > C/C++ and Fortran developers boost performance applications - > including clusters. > http://p.sf.net/sfu/intel-dev2devmay > _______________________________________________ > iText-questions mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/itext-questions > > iText(R) is a registered trademark of 1T3XT BVBA. > Many questions posted to this list can (and will) be answered > with a reference to the iText book: > http://www.itextpdf.com/book/ Please check the keywords list > before you ask for examples: http://itextpdf.com/themes/keywords.php > > ------------------------------------------------------------------------------ What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
