I would like to try to replace our external storage of documents with Lucene
stored field, so a few questions before we proceed:
Background: We store currently complete documents in a simple binary file and
only keep offsets into this file as a Stored field in Lucene index. Documents
(compressed) are short, avg 300-400bytes per document
What we would like to try is to store these documents as binary (compressed
with simple/fast static huffman + dictionary) stored Fields in order to make
maintenance of this setup easier as Lucene already does maintain updates and
fetching of these fields, also indexing works very well from multiple threads
now. Complete Document would be the only Stored field we have in index (I know
about lazy loading).
Simply what we do today is , search index, get offsets from found documents and
than fetch them from binary file.
So the questions would be:
1. Does anyone see any theoretical/practical reason why our homemade "fetch
offset from lucene stored field-> jump to this offset in document" would be
faster than "fetch stored field from Lucene directly"
2. We use Lucene Index wit MMAP directory now, so the concern is that index
could grow too large for MMAP with stored field like that. Is there a way to
say, "do not use MMAP Directory for stored Fields, rather FSDirectory". I think
not, but it is worth to ask as I think this could be useful thing... Imagine to
be able to say "Load terms with RAM Directory, postings wit MMAP and stored
fields with FSDirectory"... of course this is only for searching, not indexing.
Thanks a lot.
Eks
___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]