Hi Soheb,

On Wed, 26 Jan 2011 16:29 +0000, "Soheb Mahmood"
<soheb.luc...@gmail.com> wrote:

> We are going to implement distributed indexing for Solr - without the
> use of SolrCloud (so it can be easily up-scaled). We have a deadline by
> February to get this done, so we need to get cracking ;) 

:-)
 
> So far, we've had a look at the solr classes and thought about
> distributed indexing on Solr, and we have come up with these ideas:
> 
> 1. We plan to modify SimplePostTool to accommodate posting to specific
> shards. We are going to add an optional system property to allow the
> user to specify a list of shards to index to Solr.
> Example of this being "java
> -Durl=http://localhost:7574/solr/collection1/update
> -Dshards=localhost:8983/solr,localhost:7574/solr -jar post.jar <list of
> XML files>"

As Yonik says, the SimplePostTool is really for testing. The shard
information must be contained within the URL, and processed by an
UpdateRequestHandler (called DistributedUpdateRequestHandler?). That
way, you can embed that data into the solrconfig.xml file as an
invariant or a default, or later it can be derived from Zookeeper in
SolrCloud.

> We also plan to modify server request processing to handle distributed
> indexing. We are looking at CommonsHttpSolrServer.java for ways to
> accomplish this.
> 
> With all these changes, we realise that we are only modifying the Java
> version, and that other languages need to be updated to accommodate our
> changes (e.g. perl). We were wondering if there was a simple way of
> applying these changes we wrote in Java across all the other languages.

If you add this support to Solr itself, it is then the responsibility of
each client library to worry about supporting it.

You should only be focussing on the Solr DistributedUpdateHandler code
rather than on any client libraries (other than the code you use as your
test harness.

> 2. We are going to make an interface to handle distributed writing. We
> plan for it to sit between the Solr server and the shards - if no shards
> are specified, then the post.jar tool will work exactly the same way it
> does now. However, if the user specifies shards for post.jar, then we
> want a class that has extended our interface to kick into action. 

The interface you need will be a ShardPolicy or some such. You will hand
to it a document, and a number of or list of shards, and it will tell
you which shard that document should go in. This interface will then
allow for pluggable shard policies, whether a simple modulo on the
document ID (for deterministic indexing) or a simple round-robin (for
random indexing).

You'll then need to split the documents you've gathered from the post
request to the UpdateRequestHandler, and forward them to whichever
shards the ShardPolicy suggested.

> 3. We plan to test our results by acceptance testing (we run Solr and
> see if it works ourselves) and writing a test class.

Sounds great.

Upayavira
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to