Otis Gospodnetic wrote:

I like option 3.  I've done it before, and it worked well.  I dealt
with very small indices, though, and if your indices are several tens
or hundred gigs, this may be hard for you.

Option 4: search can be performed on an index that is being modified
(update, delete, insert, optimize). You'd just have to make sure not
to recreate new IndexSearcher too frequently, if your index is being
modified often. Just change it every X index modification or every X
minutes, and you'll be fine.


Right now I'm thinking about #4... Disk may be cheap but a fast RAID 10 array with 100G twice isn't THAT cheap... That's the worse case scenario of course and most modern search clusters use cheap hardware though...

Also... since the new indexes are SO small (~100M) the merges would probably be easier on the machine than just doing a whole new write. Of course it's hard to make that argument with a 100G RAID array but we're using rysnc to avoid distribution of network IO so the CPU computation and network read would slow things down.

The only way around this is the re-upload the whole 100G index but even over gigabit ethernet this will take 15 minutes. This doesn't scale as we add more searchers.

Thanks for the feedback! I think now tha tI know that optmize is safe as long as I don't create a new reader... I'll be fine. I do have think about how I'm going to handle search result nav.

Kevin

--

Please reply using PGP.

http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to