What kind of updates are these? new documents? Small changes to existing documents?
Are the changing fields important for searching? If the updates are not involved in searches, then it would be much better to put the non-searched characteristics onto an alternative storage system. That would drive down the update rate dramatically and leave you with a pretty simple system. If the updates *are* involved in searches, then you might consider using a system more like Katta than solr. You can then create a new shard out of the update and broadcast a mass delete to all nodes just before adding the new shard to the system. This has the benefit of very fast add updates and good balancing, but has the defect that you don't have persistence of your deletes until you do a full index again. Your search nodes could right the updated index back to the persistent store, but that is scary without something like hadoop to handle failed updates. On Tue, Mar 31, 2009 at 6:51 AM, sunnyfr <[email protected]> wrote: > > I've about 14M of document. My index is about 11G. > For the moment I update every 20mn about 30 000 documents. > Lucene alwarys merge data, What would you reckon? > My replication cost too much for the slave, they always bring back new > index > directories and no segment. > > Is there a way to get around this issue ? what would you reckon to people > who need fresh update on the slave with a big amount of data ?? > Thanks a lot, > >
