Cescy wrote:
> Hi,
> 
> 
> I am developing a pdf search engine, just use in local computer to search 
> massive pdf documents.
> 
> 
> I used pdfbox+lucene to index and search, and then I have to display the 
> context to the user in pdf file in user interface. HOW CAN I ACHIEVE THIS???

I have completed a project to do the exact same thing.  I put the pdf
text in XML files.  Then after I do a Lucene search I read the text from
the XML files.  I do not store the text in the Lucene index.  That would
bloat the index and slow down my searches.  FYI -- I use PDFBox to
extract the "searchable" text and I use tesseract (OCR) to extract the
text from the images within the PDFs.  In order to make tesseract work
correctly I have to use ImageMagick to do many modification to the
images so that tesseract can OCR them correctly.  Image modification/OCR
is a slow process and it is extremely resource intensive (CPU 
utilization specifically -- Disk IO to a lesser extent).

As far as displaying the extracted text I would use an AJAX framework 
that would provide a nice pop-up view of the text.  This pop-up should
also have built in paging.  I use Lucene's built in hi-lighting of
matches as well.

Oh almost forgot -- I use PDFBox to extract the images from the PDFs.

James
> 
> 
> THX

-- 
James J. Wilson II
Systems Engineer
U.S. District Court
District of New Mexico
333 Lomas Blvd., NW
Albuquerque, NM 87102
Phone:  (505) 348-2081
Fax:    (505) 348-2028

------------------------------------------------------------------------------
What You Don't Know About Data Connectivity CAN Hurt You
This paper provides an overview of data connectivity, details
its effect on application quality, and explores various alternative
solutions. http://p.sf.net/sfu/progress-d2d
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText® is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to