Cescy wrote: > Hi, > > > I am developing a pdf search engine, just use in local computer to search > massive pdf documents. > > > I used pdfbox+lucene to index and search, and then I have to display the > context to the user in pdf file in user interface. HOW CAN I ACHIEVE THIS???
I have completed a project to do the exact same thing. I put the pdf text in XML files. Then after I do a Lucene search I read the text from the XML files. I do not store the text in the Lucene index. That would bloat the index and slow down my searches. FYI -- I use PDFBox to extract the "searchable" text and I use tesseract (OCR) to extract the text from the images within the PDFs. In order to make tesseract work correctly I have to use ImageMagick to do many modification to the images so that tesseract can OCR them correctly. Image modification/OCR is a slow process and it is extremely resource intensive (CPU utilization specifically -- Disk IO to a lesser extent). As far as displaying the extracted text I would use an AJAX framework that would provide a nice pop-up view of the text. This pop-up should also have built in paging. I use Lucene's built in hi-lighting of matches as well. Oh almost forgot -- I use PDFBox to extract the images from the PDFs. James > > > THX -- James J. Wilson II Systems Engineer U.S. District Court District of New Mexico 333 Lomas Blvd., NW Albuquerque, NM 87102 Phone: (505) 348-2081 Fax: (505) 348-2028 ------------------------------------------------------------------------------ What You Don't Know About Data Connectivity CAN Hurt You This paper provides an overview of data connectivity, details its effect on application quality, and explores various alternative solutions. http://p.sf.net/sfu/progress-d2d _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText® is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
