Hi Doug, Thanks for the info, makes sense.
> In particular, it supports scaling the number of *readers* well. Yes this is very true and a good architecture and in fact because Java comes in 64-bit flavors allows for a smaller number of machines as per 32-bit built C systems that have memory limitations like the current Google architecture. > Yes. Folks have developed incrementally updateable IndexSearchers before, > but none is yet part of Lucene. Interesting, does this mean there is a plan for incrementally updateable IndexSearchers to become part of Lucene? Are there any negatives to updateable IndexSearchers? Thanks, Jason ----- Original Message ---- From: Doug Cutting <[EMAIL PROTECTED]> To: [email protected] Sent: Tuesday, April 25, 2006 9:04:47 PM Subject: Re: GData jason rutherglen wrote: > Ah ok, think I found it: org.apache.nutch.indexer.FsDirectory no? > > Couldn't this be used in Solr and distribute all the data rather than > master/slave it? It's possible to search a Lucene index that lives in Hadoop's DFS, but not recommended. It's very slow. It's much faster to copy the index to a local drive. The rsync approach, of only transmitting index diffs, is a very efficient way to distribute an index. In particular, it supports scaling the number of *readers* well. For read/write stuff (e.g. a calendar) such scaling might not be paramount. Rather, you might be happy to route all requests for a particular calendar to a particular server. The index/database could still be somehow replicated/synced, in case that server dies, but a single server can probably handle all requests for a particular index/database. And keeping things coherent is much simpler in this case. Doug
