Just throwing in my 2 cents.  If you're on a tight deadline have you had a
look at Solandra?  We were already using Cassandra, so it was incredibly
easy to get a scalable Solr installation up and running.

On 27 January 2011 08:17, Alex Cowell <alxc...@gmail.com> wrote:

> Hi Soheb,
>
> Sounds good! A few things I thought of:
>
> With regard to #1, would the list of shards to index to (if present) be
> exclusive or would we assume that the shard the update request was sent to
> should also be included? For example, say, using the example you gave, an
> update request was sent like so:
> java 
> -Durl=http://localhost:7574/solr/collection1/update-Dshards=localhost:8983/solr
>  -jar post.jar <list of XML files>
>
> should the documents be indexed exclusively to the 'shards list' (ie. just
> localhost:8983/solr) or the 'shards list' & the server the request was sent
> to? So specifying something like this:
> java 
> -Durl=http://localhost:7574/solr/collection1/update-Dshards=localhost:7574/solr
>  -jar post.jar <list of XML files>
> would be equivalent to:
> java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar
> <list of XML files>
>
> For a default interface to decide which shard to index to, we were thinking
> of using either a simple hash function on the document's uniqueKey modulo
> the number of shards specified in the list (as mentioned here:
> http://wiki.apache.org/solr/DistributedSearch#Distributed_Indexing) or
> some sort of round robin method, indexing a document to each shard in turn,
> until there are no more documents left to index.
>
> Also, how will we deal with failures? Should we simply return a list of all
> documents which weren't indexed or have a retry period after the initial
> indexing?
>
> Regards,
>
> Alex
>
>
>
> On Wed, Jan 26, 2011 at 4:29 PM, Soheb Mahmood <soheb.luc...@gmail.com>wrote:
>
>> Hello,
>>
>> We are going to implement distributed indexing for Solr - without the
>> use of SolrCloud (so it can be easily up-scaled). We have a deadline by
>> February to get this done, so we need to get cracking ;)
>>
>> So far, we've had a look at the solr classes and thought about
>> distributed indexing on Solr, and we have come up with these ideas:
>>
>> 1. We plan to modify SimplePostTool to accommodate posting to specific
>> shards. We are going to add an optional system property to allow the
>> user to specify a list of shards to index to Solr.
>> Example of this being "java
>> -Durl=http://localhost:7574/solr/collection1/update
>> -Dshards=localhost:8983/solr,localhost:7574/solr -jar post.jar <list of
>> XML files>"
>>
>> We also plan to modify server request processing to handle distributed
>> indexing. We are looking at CommonsHttpSolrServer.java for ways to
>> accomplish this.
>>
>> With all these changes, we realise that we are only modifying the Java
>> version, and that other languages need to be updated to accommodate our
>> changes (e.g. perl). We were wondering if there was a simple way of
>> applying these changes we wrote in Java across all the other languages.
>>
>> 2. We are going to make an interface to handle distributed writing. We
>> plan for it to sit between the Solr server and the shards - if no shards
>> are specified, then the post.jar tool will work exactly the same way it
>> does now. However, if the user specifies shards for post.jar, then we
>> want a class that has extended our interface to kick into action.
>>
>> 3. We plan to test our results by acceptance testing (we run Solr and
>> see if it works ourselves) and writing a test class.
>>
>> Does anyone have any comments to share?
>>
>> Thanks,
>> Soheb Mahmood
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>

Reply via email to