Re: Beginner: Best way to index and display orginal text of pdfs in search results

maxmil Fri, 12 Dec 2008 03:05:29 -0800

Thanks very much. Looks like Field.Store.COMPRESS is what i want.

I'll also have a look at the search highlight stuff and getting Lucene in
Action.




Ian Lea wrote:
> 
> Hi
> 
> 
> Lucene can store the original text of the document.  You make the
> lucene fields to do what you need.  Have a look at the apidocs for
> Field.Store and you'll see that you've got three choices: Yes, No or
> Compress.
> 
> For your display snapshots, have a look at the lucene highlighter package.
> 
> And all newcomers to Lucene could do a lot worse than getting hold of
> a copy of Lucene in Action.  Somewhat out of date but the principles
> are still valid.
> 
> 
> --
> Ian.
> 
> On Fri, Dec 12, 2008 at 8:34 AM, maxmil <[email protected]> wrote:
>>
>> Hi,
>>
>> This is the first time i am using Lucene.
>>
>> I need to index pdf's with very few fields, title, date and body (long
>> field) for a web based search.
>>
>> The results i need to display have to show not only the documents found
>> but
>> for each document a snapshot of the text where the search term has been
>> found. This is analogous to the way google displays search results. That
>> is
>> to say
>>
>>  ... some words and first instance of search Term and some more words ...
>> some more words second instance of search term and some more words...
>>
>> etc.
>>
>> To do this i would need the original text of the document for each hit.
>> As
>> far as i understand Lucene does not save the original text of the
>> document
>> in the index.
>>
>> I am not using a database and would prefer not to have to store the
>> original
>> document text elsewhere.
>>
>> One way i could do this would be to take the hits from Lucene and reopen
>> each pdf to extract the original text at run time however i fear that
>> with
>> many results this would be very slow.
>>
>> What would you recommend me to do?
>>
>> Thanks
>>
>> max
>> --
>> View this message in context:
>> http://www.nabble.com/Beginner%3A-Best-way-to-index-and-display-orginal-text-of-pdfs-in-search-results-tp20971377p20971377.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Beginner%3A-Best-way-to-index-and-display-orginal-text-of-pdfs-in-search-results-tp20971377p20973618.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Beginner: Best way to index and display orginal text of pdfs in search results

Reply via email to