Just throwing in my 2 cents. If you're on a tight deadline have you had a look at Solandra? We were already using Cassandra, so it was incredibly easy to get a scalable Solr installation up and running.
On 27 January 2011 08:17, Alex Cowell <alxc...@gmail.com> wrote: > Hi Soheb, > > Sounds good! A few things I thought of: > > With regard to #1, would the list of shards to index to (if present) be > exclusive or would we assume that the shard the update request was sent to > should also be included? For example, say, using the example you gave, an > update request was sent like so: > java > -Durl=http://localhost:7574/solr/collection1/update-Dshards=localhost:8983/solr > -jar post.jar <list of XML files> > > should the documents be indexed exclusively to the 'shards list' (ie. just > localhost:8983/solr) or the 'shards list' & the server the request was sent > to? So specifying something like this: > java > -Durl=http://localhost:7574/solr/collection1/update-Dshards=localhost:7574/solr > -jar post.jar <list of XML files> > would be equivalent to: > java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar > <list of XML files> > > For a default interface to decide which shard to index to, we were thinking > of using either a simple hash function on the document's uniqueKey modulo > the number of shards specified in the list (as mentioned here: > http://wiki.apache.org/solr/DistributedSearch#Distributed_Indexing) or > some sort of round robin method, indexing a document to each shard in turn, > until there are no more documents left to index. > > Also, how will we deal with failures? Should we simply return a list of all > documents which weren't indexed or have a retry period after the initial > indexing? > > Regards, > > Alex > > > > On Wed, Jan 26, 2011 at 4:29 PM, Soheb Mahmood <soheb.luc...@gmail.com>wrote: > >> Hello, >> >> We are going to implement distributed indexing for Solr - without the >> use of SolrCloud (so it can be easily up-scaled). We have a deadline by >> February to get this done, so we need to get cracking ;) >> >> So far, we've had a look at the solr classes and thought about >> distributed indexing on Solr, and we have come up with these ideas: >> >> 1. We plan to modify SimplePostTool to accommodate posting to specific >> shards. We are going to add an optional system property to allow the >> user to specify a list of shards to index to Solr. >> Example of this being "java >> -Durl=http://localhost:7574/solr/collection1/update >> -Dshards=localhost:8983/solr,localhost:7574/solr -jar post.jar <list of >> XML files>" >> >> We also plan to modify server request processing to handle distributed >> indexing. We are looking at CommonsHttpSolrServer.java for ways to >> accomplish this. >> >> With all these changes, we realise that we are only modifying the Java >> version, and that other languages need to be updated to accommodate our >> changes (e.g. perl). We were wondering if there was a simple way of >> applying these changes we wrote in Java across all the other languages. >> >> 2. We are going to make an interface to handle distributed writing. We >> plan for it to sit between the Solr server and the shards - if no shards >> are specified, then the post.jar tool will work exactly the same way it >> does now. However, if the user specifies shards for post.jar, then we >> want a class that has extended our interface to kick into action. >> >> 3. We plan to test our results by acceptance testing (we run Solr and >> see if it works ourselves) and writing a test class. >> >> Does anyone have any comments to share? >> >> Thanks, >> Soheb Mahmood >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >