No there are no duplicate copies - I've the correct number when I view through luke and I don't overlap - the temporary index is destroyed after it is added to the main index - I'm currently at index version 159 and it seems that all of my .prx files come in at around 1435 megs (ouch)
Thanks Garrett -----Original Message----- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: 06 December 2004 17:12 To: Lucene Users List Subject: Re: addIndexes() Size If I were you, I would first use Luke to peek at the index. You may find something obvious there, like multiple copies of the same Document. Does your temp index 'overlap' with A index in terms of Documents? If so, you will end up with multliple copies, as addIndexes method doesn't detect and remove duplicate Documents. Otis --- Garrett Heaver <[EMAIL PROTECTED]> wrote: > Hi. > > > > Its probably really simple to explain this but since I'm not up to > speed on > the way Lucene stores the data I'm a little confused. > > > > I'm building an Index, which resides on Server A, with the Lucene > Service > running on Server B. Now not to bore you with the details but because > of the > network transfer rate etc I'm running the actual index on > \\ServerA\idx > <file:///\\ServerA\idx> and building a temp Index at > \\ServerB\idx\temp > <file:///\\ServerB\idx\temp> (obviously because the Local FS is much > faster > for the service) and then calling addIndexes to import the temp index > to the > ServerA index before destroying the ServerB index, holding for a bit > and > then checking for new documents. > > > > All works grand BUT the size of the resultant index on ServerA is > HUGE in > comparison to one I'd build from start to finish (i.e. a simple > addDocument > Index) - 38gig for 220,000 Unstored Items cannot be right (to give > you and > idea of how mad this seems, the backed up version of the database > from which > the data is pulled is only 2gigs) > > > > I've considered it being perhaps the number of Items that had to be > integrated each time addIndexes was called - right now I'm adding > around > 10,000 at a time (I had done 1000 at a time but this looked like it > was > going to end up even larger still) > > > > I'm holding off twiddling the minMergeDocs and mergeFactor until I > can get a > better understanding of whats going on here. > > > > Many thanks for any reply's > > Garrett > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
