Thanks very much. Looks like Field.Store.COMPRESS is what i want. I'll also have a look at the search highlight stuff and getting Lucene in Action.
Ian Lea wrote: > > Hi > > > Lucene can store the original text of the document. You make the > lucene fields to do what you need. Have a look at the apidocs for > Field.Store and you'll see that you've got three choices: Yes, No or > Compress. > > For your display snapshots, have a look at the lucene highlighter package. > > And all newcomers to Lucene could do a lot worse than getting hold of > a copy of Lucene in Action. Somewhat out of date but the principles > are still valid. > > > -- > Ian. > > On Fri, Dec 12, 2008 at 8:34 AM, maxmil <m...@alwayssunny.com> wrote: >> >> Hi, >> >> This is the first time i am using Lucene. >> >> I need to index pdf's with very few fields, title, date and body (long >> field) for a web based search. >> >> The results i need to display have to show not only the documents found >> but >> for each document a snapshot of the text where the search term has been >> found. This is analogous to the way google displays search results. That >> is >> to say >> >> ... some words and first instance of search Term and some more words ... >> some more words second instance of search term and some more words... >> >> etc. >> >> To do this i would need the original text of the document for each hit. >> As >> far as i understand Lucene does not save the original text of the >> document >> in the index. >> >> I am not using a database and would prefer not to have to store the >> original >> document text elsewhere. >> >> One way i could do this would be to take the hits from Lucene and reopen >> each pdf to extract the original text at run time however i fear that >> with >> many results this would be very slow. >> >> What would you recommend me to do? >> >> Thanks >> >> max >> -- >> View this message in context: >> http://www.nabble.com/Beginner%3A-Best-way-to-index-and-display-orginal-text-of-pdfs-in-search-results-tp20971377p20971377.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Beginner%3A-Best-way-to-index-and-display-orginal-text-of-pdfs-in-search-results-tp20971377p20973618.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org