Re: Real time indexing and distribution to lucene on separate boxes (long)

Kevin A. Burton Thu, 11 Mar 2004 13:42:24 -0800

Matalon wrote:

To clarify how option 3 works:

You have dira where the search is done and dirb where the indexing is
done. dirb grows when you add new items to it, and at some point you
swap and dirb becomes dira, but what do you do then?

The Searcher reloads and points to dira...

Also, how do you write from the indexer to the directory on the search box?

We rsync the content over...

2. The index is NFS mounted. The indexer keeps writing to the index, and
at defined times, creates a NFS snapshot of the index. It then creates
an entry in a db to let the searcher know that a new snapshot has been
created.
The searcher checks once a minute the db to see if there's a new
snapshot. If there is one, it opens the index in the new snapshot and
swaps it for the old one. The code to do this is synchronized.

The nice thing about this solution is that you don't have just one copy
of the index and don't do any copying. But you need to use NFS and
snapshots.

Well... right now I'm thinking that if I can do a merge on the box with < 200M per commit that this won't be too much of a burden on the searchers as long as it happens at regular intervals.

Right now though I'm going to have to test this to make sure I can keep doing a query and an index merge on the same box with the merge happening in a diff process.

Going to send off an email about this in a minute :)

Kevin

--

Please reply using PGP.

http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

signature.asc
Description: OpenPGP digital signature

Re: Real time indexing and distribution to lucene on separate boxes (long)

Reply via email to