Hi Soheb,

Sounds good! A few things I thought of:

With regard to #1, would the list of shards to index to (if present) be
exclusive or would we assume that the shard the update request was sent to
should also be included? For example, say, using the example you gave, an
update request was sent like so:
java 
-Durl=http://localhost:7574/solr/collection1/update-Dshards=localhost:8983/solr
-jar post.jar <list of XML files>

should the documents be indexed exclusively to the 'shards list' (ie. just
localhost:8983/solr) or the 'shards list' & the server the request was sent
to? So specifying something like this:
java 
-Durl=http://localhost:7574/solr/collection1/update-Dshards=localhost:7574/solr
-jar post.jar <list of XML files>
would be equivalent to:
java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar <list
of XML files>

For a default interface to decide which shard to index to, we were thinking
of using either a simple hash function on the document's uniqueKey modulo
the number of shards specified in the list (as mentioned here:
http://wiki.apache.org/solr/DistributedSearch#Distributed_Indexing) or some
sort of round robin method, indexing a document to each shard in turn, until
there are no more documents left to index.

Also, how will we deal with failures? Should we simply return a list of all
documents which weren't indexed or have a retry period after the initial
indexing?

Regards,

Alex


On Wed, Jan 26, 2011 at 4:29 PM, Soheb Mahmood <soheb.luc...@gmail.com>wrote:

> Hello,
>
> We are going to implement distributed indexing for Solr - without the
> use of SolrCloud (so it can be easily up-scaled). We have a deadline by
> February to get this done, so we need to get cracking ;)
>
> So far, we've had a look at the solr classes and thought about
> distributed indexing on Solr, and we have come up with these ideas:
>
> 1. We plan to modify SimplePostTool to accommodate posting to specific
> shards. We are going to add an optional system property to allow the
> user to specify a list of shards to index to Solr.
> Example of this being "java
> -Durl=http://localhost:7574/solr/collection1/update
> -Dshards=localhost:8983/solr,localhost:7574/solr -jar post.jar <list of
> XML files>"
>
> We also plan to modify server request processing to handle distributed
> indexing. We are looking at CommonsHttpSolrServer.java for ways to
> accomplish this.
>
> With all these changes, we realise that we are only modifying the Java
> version, and that other languages need to be updated to accommodate our
> changes (e.g. perl). We were wondering if there was a simple way of
> applying these changes we wrote in Java across all the other languages.
>
> 2. We are going to make an interface to handle distributed writing. We
> plan for it to sit between the Solr server and the shards - if no shards
> are specified, then the post.jar tool will work exactly the same way it
> does now. However, if the user specifies shards for post.jar, then we
> want a class that has extended our interface to kick into action.
>
> 3. We plan to test our results by acceptance testing (we run Solr and
> see if it works ourselves) and writing a test class.
>
> Does anyone have any comments to share?
>
> Thanks,
> Soheb Mahmood
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to