Hi
Lucene can store the original text of the document. You make the lucene fields to do what you need. Have a look at the apidocs for Field.Store and you'll see that you've got three choices: Yes, No or Compress. For your display snapshots, have a look at the lucene highlighter package. And all newcomers to Lucene could do a lot worse than getting hold of a copy of Lucene in Action. Somewhat out of date but the principles are still valid. -- Ian. On Fri, Dec 12, 2008 at 8:34 AM, maxmil <m...@alwayssunny.com> wrote: > > Hi, > > This is the first time i am using Lucene. > > I need to index pdf's with very few fields, title, date and body (long > field) for a web based search. > > The results i need to display have to show not only the documents found but > for each document a snapshot of the text where the search term has been > found. This is analogous to the way google displays search results. That is > to say > > ... some words and first instance of search Term and some more words ... > some more words second instance of search term and some more words... > > etc. > > To do this i would need the original text of the document for each hit. As > far as i understand Lucene does not save the original text of the document > in the index. > > I am not using a database and would prefer not to have to store the original > document text elsewhere. > > One way i could do this would be to take the hits from Lucene and reopen > each pdf to extract the original text at run time however i fear that with > many results this would be very slow. > > What would you recommend me to do? > > Thanks > > max > -- > View this message in context: > http://www.nabble.com/Beginner%3A-Best-way-to-index-and-display-orginal-text-of-pdfs-in-search-results-tp20971377p20971377.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org