Doug Cutting wrote:


Something sounds very wrong for there to be that many files.

The maximum number of files should be around:

(7 + numIndexedFields) * (mergeFactor-1) * (log_base_mergeFactor(numDocs/minMergeDocs))

With 14M documents, log_10(14M/1000) is 4, which gives, for you:

(7 + numIndexedFields) * 36 = 230k
7*36 + numIndexedFields*36 = 230k
numIndexedFields = (230k - 7*36) / 36 =~ 6k

So you'd have to have around 6k unique field names to get 230k files. Or something else must be wrong. Are you running on win32, where file deletion can be difficult?

With the typical handful of fields, one should never see more than hundreds of files.

We only have 13 fields... Though to be honest I'm worried that even if I COULD do the optimize that it would run out of file handles.

This is very strange...

I'm going to increase minMergeDocs to 10000 and then run the full converstion on one box and then try to do an optimize (of the corrupt) another box. See which one finishes first.

I assume the speed of optimize() can be increased the same way that indexing is increased...

Kevin

--

Please reply using PGP.

http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster



--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to