Hi, This is the first time i am using Lucene.
I need to index pdf's with very few fields, title, date and body (long field) for a web based search. The results i need to display have to show not only the documents found but for each document a snapshot of the text where the search term has been found. This is analogous to the way google displays search results. That is to say ... some words and first instance of search Term and some more words ... some more words second instance of search term and some more words... etc. To do this i would need the original text of the document for each hit. As far as i understand Lucene does not save the original text of the document in the index. I am not using a database and would prefer not to have to store the original document text elsewhere. One way i could do this would be to take the hits from Lucene and reopen each pdf to extract the original text at run time however i fear that with many results this would be very slow. What would you recommend me to do? Thanks max -- View this message in context: http://www.nabble.com/Beginner%3A-Best-way-to-index-and-display-orginal-text-of-pdfs-in-search-results-tp20971377p20971377.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org