Re: GData, updateable IndexSearcher

jason rutherglen Wed, 26 Apr 2006 10:20:18 -0700

Hi Doug,

Thanks for the info, makes sense.

> In particular, it supports scaling the number of *readers* well.

Yes this is very true and a good architecture and in fact because Java comes in 
64-bit flavors allows for a smaller number of machines as per 32-bit built C 
systems that have memory limitations like the current Google architecture.  

> Yes.  Folks have developed incrementally updateable IndexSearchers before, 
> but none is yet part of Lucene.

Interesting, does this mean there is a plan for incrementally updateable 
IndexSearchers to become part of Lucene?  Are there any negatives to updateable 
IndexSearchers?  

Thanks,

Jason

----- Original Message ----
From: Doug Cutting <[EMAIL PROTECTED]>
To: [email protected]
Sent: Tuesday, April 25, 2006 9:04:47 PM
Subject: Re: GData

jason rutherglen wrote:
> Ah ok, think I found it: org.apache.nutch.indexer.FsDirectory no?
> 
> Couldn't this be used in Solr and distribute all the data rather than 
> master/slave it?

It's possible to search a Lucene index that lives in Hadoop's DFS, but 
not recommended.  It's very slow.  It's much faster to copy the index to 
a local drive.

The rsync approach, of only transmitting index diffs, is a very 
efficient way to distribute an index.  In particular, it supports 
scaling the number of *readers* well.

For read/write stuff (e.g. a calendar) such scaling might not be 
paramount.  Rather, you might be happy to route all requests for a 
particular calendar to a particular server.  The index/database could 
still be somehow replicated/synced, in case that server dies, but a 
single server can probably handle all requests for a particular 
index/database.  And keeping things coherent is much simpler in this case.

Doug

Re: GData, updateable IndexSearcher

Reply via email to