Hi, I have to search a single pdf document for requested string and if that string is found, I need to return a page number where that string was found. Requested string can be anything in a pdf document.
It is a big document(abount 5000 pages) so I'm asking if that is possible with lucene. I'm using pdfbox class and i found a way to do it (searching with instring page by page) but it is too slow: PDDocument pddDocument=PDDocument.load(f); PDFTextStripper textStripper=new PDFTextStripper(); int lastpage = textStripper.getEndPage(); String page= null; int found= 0; for(int i=1; i<lastpage ; i++){ textStripper.setStartPage(i); textStripper.setEndPage(i); page = textStripper.getText(pddDocument); found = page .indexOf(searchtext); if (found>0) {returnpage= i; break;} } ---------------- Is there a way to speed up the search with lucene? Can I use indexing to solve this problem? thanks. -- View this message in context: http://www.nabble.com/search-trough-single-pdf-document---return-page-number-tp25905217p25905217.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org