There was a bug in 1.4 (and maybe 1.4.1?) that kept some index files around that were not used.

Are you using Lucene 1.4.3?  It not, try that and see if it helps.

        Erik

On Dec 6, 2004, at 12:17 PM, Garrett Heaver wrote:

No there are no duplicate copies - I've the correct number when I view
through luke and I don't overlap - the temporary index is destroyed after it
is added to the main index - I'm currently at index version 159 and it seems
that all of my .prx files come in at around 1435 megs (ouch)


Thanks
Garrett

-----Original Message-----
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: 06 December 2004 17:12
To: Lucene Users List
Subject: Re: addIndexes() Size

If I were you, I would first use Luke to peek at the index.  You may
find something obvious there, like multiple copies of the same
Document.
Does your temp index 'overlap' with A index in terms of Documents?  If
so, you will end up with multliple copies, as addIndexes method doesn't
detect and remove duplicate Documents.

Otis

--- Garrett Heaver <[EMAIL PROTECTED]> wrote:

Hi.



Its probably really simple to explain this but since I'm not up to
speed on
the way Lucene stores the data I'm a little confused.



I'm building an Index, which resides on Server A, with the Lucene
Service
running on Server B. Now not to bore you with the details but because
of the
network transfer rate etc I'm running the actual index on
\\ServerA\idx
<file:///\\ServerA\idx>  and building a temp Index at
\\ServerB\idx\temp
<file:///\\ServerB\idx\temp>  (obviously because the Local FS is much
faster
for the service) and then calling addIndexes to import the temp index
to the
ServerA index before destroying the ServerB index, holding for a bit
and
then checking for new documents.



All works grand BUT the size of the resultant index on ServerA is
HUGE in
comparison to one I'd build from start to finish (i.e. a simple
addDocument
Index) - 38gig for 220,000 Unstored Items cannot be right (to give
you and
idea of how mad this seems, the backed up version of the database
from which
the data is pulled is only 2gigs)



I've considered it being perhaps the number of Items that had to be
integrated each time addIndexes was called - right now I'm adding
around
10,000 at a time (I had done 1000 at a time but this looked like it
was
going to end up even larger still)



I'm holding off twiddling the minMergeDocs and mergeFactor until I
can get a
better understanding of whats going on here.



Many thanks for any reply's

Garrett






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to