I'm using the latest release of
Lucene.Net.
Here's the steps of the application:
1. create index
2. open index reader to remove stuff
3. close index reader
4. open index writer to add stuff
5. optimize and close index writer
2-5 are repeated at intervals. So there's always at most one object
writing to the index at one point in time.
Not a big issue after all, but thanks for your help.
Simone
Jokin Cuadrado wrote:
wich version of lucene you use?
have you a reader opened?
it seems reasonable to me, because if i remember well, the cleaning of
old unused files is made when the index is opened. ¿have you tried to
open the index with lucene.net after creating it to see if the result
is the same?
jokin
On 7/18/07, Simone Busoli <[EMAIL PROTECTED]> wrote:
I don't know. This is the situation when I
create and optimize the index
with Lucene.Net:
segments 28 Byte
_i5.cfs 543 kByte
deletable 12 Byte
_bd.cfs 317 kByte
Once the index is opened with Luke only segments and _i5.cfs remain,
untouched. So the only difference is that _bd.cfs and deletable are
removed. Well, deletable looked like a good candidate to be deleted,
but
what about _bd.cfs? It looks like it wans't needed then.
Simone
Jokin Cuadrado wrote:
> I'm wandering about, but may be an issue with the text
codification
> used? if it's just the 50%, maybe lucene.net it's using a
codification
> than needs 2 bytes for each character by default, and luke is
using
> one that only needs 1 byte.
>
> regard the number of files, maybe luke don't take acount of the
> "deletables" file, that contains the files that are no longer used
and
> may be deleted because it don't delete files. But i think that
it's no
> relevant to the another question.
>
> jokin.
>
> On 7/17/07, Simone Busoli <[EMAIL PROTECTED]> wrote:
>>
>> Hi Jokin,
>>
>> actually I found some information about it. As far as I've
discovered
>> compression can be applied to fields of documents, before
adding them
>> to the
>> index, even if Lucene.Net doesn't supply it out of the box.
But the
>> issue I
>> reported doesn't have to do with this, because index size
reduction
>> seems to
>> be applied to a higher level by Luke, I mean, to an index
already
>> containing
>> documents with uncompressed fields. In fact, when reopening
the index
>> with
>> Lucene.Net after it's been opened - and you see, optimized -
by Luke,
>> I am
>> still able to read it, even if I didn't configure support for
>> compression.
>> This means that Luke didn't compress the contents of the
documents
>> contained
>> in the index (it would be a weird behavior after all), but
instead did
>> something like optimizing the format of the files of the
index. Another
>> detail is that when I write my index with Lucene.Net I end up
with at
>> least
>> 3 files, while when I open it with Luke I always get 2 files
only.
>> And yes,
>> I am calling IndexWriter.Optimize() when finished indexing. Am
I missing
>> something maybe?
>>
>> Simone
>
|