Hello
It's a whil that I am using lucene and as most of people seemingly do, I
used to save only some important fields of a docuemnt in the index. But
recently I thought why not store the whole document bytes as an untokenized
field in the index in order to ease the retrieval process? For example
serialize the pdf file into a byte[] and then save the bytes as a field in
the index.(some gzip and base64 encodings may be needed as glue logic). Then
I can delete the original file from the system. Is there any reason against
this idea? Can lucene bear this large volume of input streamed data?

Reply via email to