Re: Solution to Solr Master-Slave single point of failure.

Arcadius Ahouansou Wed, 18 Jun 2014 17:41:29 -0700

Hello Erick.

On 17 June 2014 16:52, Erick Erickson <erickerick...@gmail.com> wrote:


> The sticky parts of that solution (off the top of my head) are
>
> > assuring that the two masters have all the updates. How do you guarantee
> that the updates succeed and the two masters do, indeed, contain the exact
> same information?
>
>
Let's assume the simple case when the Solr DIH is being used to pull data
from a central RDBMS to each master every 15min.
In that case, both master should be in sync even if the index versions are
different.
A monitoring system could be used to periodically check doc count is almost
the same on both masters.


> There'd have to be logic to insure that when the switch was made, the
> entire index was replicated. How would the slave know which segments to
> replicate from the master? Especially since the segments would NOT be
> identical, the slaves would have to replicate the entire index...
>
>

In the event of a switch-over, I would expect the slaves fetching the
whole/full index from master02
In production, the monitoring system should also alert the support team.


> > What to do when the first master came back up? Which one should be the
> "one true source"?
>
>

We have 2 options here:
- either stay on master02 until a human intervention (rest API reset or
restart of master02), or
- switch back to master01 automatically



> > The whole question of all the slaves knowing what master to ping is
> actually pretty ambiguous. What happens if slave 1 pings master1 and
> there's a temporary network glitch so it switches to master2. Meanwhile,
> due to timing, slave2 thinks master1 is still online. How to detect/track
> this?
>
>
I thought about this situation and I must admit that it's a tricky one.
We should offer the option to configure the slaves to switch let's say only
after N failures (configurable) or after retrying for a configurable
 period of time.



> When you start to spin these scenarios, you start needing some kind of
> cluster state accessible to all slaves, and then you start thinking
> about ZooKeeper and you're swiftly back to SolrCloud.
>
> The thinking in traditional Solr M/S situations avoids having two
> masters, if a master dies you "promote" one of the slaves to be the
> new master. The tricky bit here is to re-index data from before the
> time the old master died to the new master.
>
> So far, that's been "good enough" for M/S setups, and then SolrCloud
> came along so I suspect not much effort would be put into something
> like what you suggest; the effort should be towards hardening
> SolrCloud...
>
>

Yes, I do understand that SolrCloud is the future.
However, removing this singlePointOfFailure from the traditional
master-slave deployment model would not require a lot of effort IMHO and
would give huge benefit in term of choice.

The other question is: How many sites are on SolrCloud? How many are still
on master-slave?

Thank you very much.

Arcadius.





> Best,
> Erick
>
> On Tue, Jun 17, 2014 at 6:54 AM, Alessandro Benedetti
> <benedetti.ale...@gmail.com> wrote:
> > Hello Arcadius,
> > why not simple moving to SolrCloud that already addresses fault tolerance
> > and high availability ?
> > Simply imagine a configuration of :
> > 1 shard, factor of replciation 3.
> > And you have even a better scenario than 2 masters and 1 slave.
> >
> > Cheers
> >
> >
> > 2014-06-17 14:43 GMT+01:00 Arcadius Ahouansou <arcad...@menelic.com>:
> >
> >> Hello.
> >>
> >>
> >> SolrCloud has been out for a while now.
> >>
> >> However, there are still many installations running Solr4 in the
> >> traditional master-slave setup.
> >>
> >>
> >> Currently, the Solr Master is the single point of failure of most
> >> master-slave deployment.
> >>
> >> This could be easily addressed by having :
> >>
> >>
> >> a- 2 independents Solr Masters running side-by-side and being fed
> >> simultaneously,
> >>
> >> b- all slaves configured with masterUrl=masterUrl01,masterUrl02 (needs
> to
> >> be implemented)
> >>
> >> c- by default, masterUrl01 will be used by all slaves.
> >>
> >> d- When the slaves catch an exception (like NoRouteToHostException or
> >> ConnectionTimedOutException etc), they will retry a couple of times
> before
> >> switching to using masterUrl02.
> >>
> >>
> >> I suppose you have thought about this issue before.
> >>
> >> So I would like to know whether there are issues with such a simple
> >> solution.
> >>
> >> This could also help deploy Solr across 2 different data-centers.
> >>
> >>
> >> Thank you very much.
> >>
> >>
> >> Arcadius.
> >>
> >>
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-- 
Arcadius Ahouansou
Menelic Ltd | Information is Power
M: 07908761999
W: www.menelic.com
---

Re: Solution to Solr Master-Slave single point of failure.

Reply via email to