Re: Index physical size

Jokin Cuadrado Tue, 17 Jul 2007 04:33:34 -0700

I'm wandering about, but may be an issue with the text codification
used? if it's just the 50%, maybe lucene.net it's using a codification
than needs 2 bytes for each character by default, and luke is using
one that only needs 1 byte.


regard the number of files,  maybe luke don't take acount of the
"deletables" file, that contains the files that are no longer used and
may be deleted because it don't delete files. But i think that it's no
relevant to the another question.

jokin.

On 7/17/07, Simone Busoli <[EMAIL PROTECTED]> wrote:


 Hi Jokin,

 actually I found some information about it. As far as I've discovered
compression can be applied to fields of documents, before adding them to the
index, even if Lucene.Net doesn't supply it out of the box. But the issue I
reported doesn't have to do with this, because index size reduction seems to
be applied to a higher level by Luke, I mean, to an index already containing
documents with uncompressed fields. In fact, when reopening the index with
Lucene.Net after it's been opened - and you see, optimized - by Luke, I am
still able to read it, even if I didn't configure support for compression.
This means that Luke didn't compress the contents of the documents contained
in the index (it would be a weird behavior after all), but instead did
something like optimizing the format of the files of the index. Another
detail is that when I write my index with Lucene.Net I end up with at least
3 files, while when I open it with Luke I always get 2 files only. And yes,
I am calling IndexWriter.Optimize() when finished indexing. Am I missing
something maybe?

 Simone

Re: Index physical size

Reply via email to