Hello Erick. On 17 June 2014 16:52, Erick Erickson <erickerick...@gmail.com> wrote:
> The sticky parts of that solution (off the top of my head) are > > > assuring that the two masters have all the updates. How do you guarantee > that the updates succeed and the two masters do, indeed, contain the exact > same information? > > Let's assume the simple case when the Solr DIH is being used to pull data from a central RDBMS to each master every 15min. In that case, both master should be in sync even if the index versions are different. A monitoring system could be used to periodically check doc count is almost the same on both masters. > There'd have to be logic to insure that when the switch was made, the > entire index was replicated. How would the slave know which segments to > replicate from the master? Especially since the segments would NOT be > identical, the slaves would have to replicate the entire index... > > In the event of a switch-over, I would expect the slaves fetching the whole/full index from master02 In production, the monitoring system should also alert the support team. > > What to do when the first master came back up? Which one should be the > "one true source"? > > We have 2 options here: - either stay on master02 until a human intervention (rest API reset or restart of master02), or - switch back to master01 automatically > > The whole question of all the slaves knowing what master to ping is > actually pretty ambiguous. What happens if slave 1 pings master1 and > there's a temporary network glitch so it switches to master2. Meanwhile, > due to timing, slave2 thinks master1 is still online. How to detect/track > this? > > I thought about this situation and I must admit that it's a tricky one. We should offer the option to configure the slaves to switch let's say only after N failures (configurable) or after retrying for a configurable period of time. > When you start to spin these scenarios, you start needing some kind of > cluster state accessible to all slaves, and then you start thinking > about ZooKeeper and you're swiftly back to SolrCloud. > > The thinking in traditional Solr M/S situations avoids having two > masters, if a master dies you "promote" one of the slaves to be the > new master. The tricky bit here is to re-index data from before the > time the old master died to the new master. > > So far, that's been "good enough" for M/S setups, and then SolrCloud > came along so I suspect not much effort would be put into something > like what you suggest; the effort should be towards hardening > SolrCloud... > > Yes, I do understand that SolrCloud is the future. However, removing this singlePointOfFailure from the traditional master-slave deployment model would not require a lot of effort IMHO and would give huge benefit in term of choice. The other question is: How many sites are on SolrCloud? How many are still on master-slave? Thank you very much. Arcadius. > Best, > Erick > > On Tue, Jun 17, 2014 at 6:54 AM, Alessandro Benedetti > <benedetti.ale...@gmail.com> wrote: > > Hello Arcadius, > > why not simple moving to SolrCloud that already addresses fault tolerance > > and high availability ? > > Simply imagine a configuration of : > > 1 shard, factor of replciation 3. > > And you have even a better scenario than 2 masters and 1 slave. > > > > Cheers > > > > > > 2014-06-17 14:43 GMT+01:00 Arcadius Ahouansou <arcad...@menelic.com>: > > > >> Hello. > >> > >> > >> SolrCloud has been out for a while now. > >> > >> However, there are still many installations running Solr4 in the > >> traditional master-slave setup. > >> > >> > >> Currently, the Solr Master is the single point of failure of most > >> master-slave deployment. > >> > >> This could be easily addressed by having : > >> > >> > >> a- 2 independents Solr Masters running side-by-side and being fed > >> simultaneously, > >> > >> b- all slaves configured with masterUrl=masterUrl01,masterUrl02 (needs > to > >> be implemented) > >> > >> c- by default, masterUrl01 will be used by all slaves. > >> > >> d- When the slaves catch an exception (like NoRouteToHostException or > >> ConnectionTimedOutException etc), they will retry a couple of times > before > >> switching to using masterUrl02. > >> > >> > >> I suppose you have thought about this issue before. > >> > >> So I would like to know whether there are issues with such a simple > >> solution. > >> > >> This could also help deploy Solr across 2 different data-centers. > >> > >> > >> Thank you very much. > >> > >> > >> Arcadius. > >> > >> > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Arcadius Ahouansou Menelic Ltd | Information is Power M: 07908761999 W: www.menelic.com ---