Re: distributed search on duplicate shards

mike anderson Thu, 30 Sep 2010 06:57:08 -0700

Thanks for the feedback. I ended up posting a patch to JIRA
(SOLR-2132<https://issues.apache.org/jira/browse/SOLR-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>),
although I've made a few changes since that patch. Already from our initial
tests we've seen a 10% improvement in the 90% line for response times, which
translates to a 50% improvement in the average time.


It would be nice to know more about the current plans for SolrCloud and it's
future development road map. I've seen a few threads on here asking for more
information, but it doesn't seem like a popular subject. I'll keep an eye on
it though.

Cheers,
Mike


On Wed, Sep 29, 2010 at 2:46 PM, Chris Hostetter
<[email protected]>wrote:

>
> : 4. The first shard from a set (solr1a, solr1b) to successfully return is
> : honored, and the other requests (solr1b, if solr1a responds first, for
> : instance) are removed/ignored
> : 5. The response is completed and returned as soon as one shard from each
> set
> : responds
>
> It seems like a useful feature to me ... i know some folks who have
> (non Solr/Lucene based) custom search infrastructures that do roughly
> the same thing.
>
> : 1. What are the known disadvantages to such a strategy? (we've thought of
> a
> : few, like sets being out of sync, but they don't bother us too much)
>
> you wind up burning a lot of CPU, but that's not a disadvantage as much sa
> it is a trade off -- the whole point of doing something like this is that
> you'd rather burn CPU (and wasting network IO) in order to improve your
> worst case latency.
>
> : 2. What would this type of a feature be called? This way I can open a
> Jira
> : ticket for it
>
> no idea ... "redundent shard requests" comes to mind.
>
> : 3. Is there a preferred way to do this? My current patch (wich I can post
> : soon) works in the HTTPClient portion of SearchHandler. I keep a hash map
> of
> : the shard sets and cancel the Future<ShardResponse>'s in the
> corresponding
> : set when each response comes back.
>         ...
> : P.S I'd like to write a test for this feature but it wasn't clear from
> the
> : distributed test how to do so. Could somebody point me in the right
> : direction (an existing test, perhaps) for how to accomplish this?
>
> I don't relaly have a good answer for either of those questions, but the
> one thing i can suggest is thta you take a look at the SolrCloud branch
> and think about how this functionality would integrate with that (both in
> terms of implementation and in how SolrCloud unit tests work)
>
> As you mentioned: the current approach in SolrCloud is to load balance
> against identical shards on mutiple nodes in the cluster, but that's not
> contradictory with your idea: they can work in conjunction with eachother
> (ie: imagine "shard1" has four physical instances: "shard1Ax", "shard1Ay",
> "shard1Bq" and "shard1Bp" ... a request for "shard1" could trigger two
> "redundent parallel shard requests" for "shard1A" and "shard1B" and each
> of those requests could then load balance between the respecitve
> underlying physical shards.
>
>
>
> -Hoss
>
> --
> http://lucenerevolution.org/  ...  October 7-8, Boston
> http://bit.ly/stump-hoss      ...  Stump The Chump!
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: distributed search on duplicate shards

Reply via email to