Re: Beginner: Best way to index and display orginal text of pdfs in search results

Paul Libbrecht Fri, 12 Dec 2008 03:47:55 -0800

I also encountered these options of the Field constructor but I never found a way to be sure that the field is really not loaded in RAM and only return with Field.reader(). There seems to be no contract in the javadoc. Moreover the reader access methods went away between 1.9 and 2.2 if I don't mistake... so I had the impression it was not wanted to store "blobs" in the index.


Also, reader is not enough to do a decent job to store PDFs.

It should be a binary format (so getBinaryValue() should be used) and it should be an input-stream and not an in-memory array!

Echoes of a long frustrated user which implemented its own "mass- storage" because of that.

thanks for hints and even contradictions!

paul


Le 12-déc.-08 à 10:49, Ian Lea a écrit :

Lucene can store the original text of the document.  You make the
lucene fields to do what you need.  Have a look at the apidocs for
Field.Store and you'll see that you've got three choices: Yes, No or
Compress.

For your display snapshots, have a look at the lucene highlighter package.


And all newcomers to Lucene could do a lot worse than getting hold of
a copy of Lucene in Action.  Somewhat out of date but the principles
are still valid.

smime.p7s
Description: S/MIME cryptographic signature

Re: Beginner: Best way to index and display orginal text of pdfs in search results

Reply via email to