I'm beginning to think that your starting point is going to make improvement difficult ...
On Tue, Aug 2, 2016 at 5:48 PM, Adrian <[email protected]> wrote: > For now yes, this particular server is self contained. If either the > front or the back end fails, one is no good without the other and > separating them would mean doubling resources. Also don't want to touch > existing systems. Not yet at least. If you want to get more resiliency, you're going to have to commit more resources. Now I appreciate that this is to be spread out over time, but if your end goal is "resiliency using a single production host with a warm spare" you can only achieve that by changing your production software to also cluster into a warm-spare model, not by fixing it out-of-band from the OS. You can certainly improve things from the OS (rsync, DRDB and so on) but you'll never end up with a perfect fit. Obviously you're realistic about this :-) > Ideally all the data created before the crash will be replicated to the > mirror so after the switch the work can resume without loss of data. So a replication model usually says "once every period, we'll synchronise state", and therefore you will lose any update that happened between sync events. A clustered model says "every update is committed to multiple locations automatically", and you get to balance risk against performance; but if both members of the cluster are 'identical' and 'on the same LAN' they should pretty much always happen simultaneously. > I have to keep an open mind here while also > balancing some budgets. Absolutely. You need a Risk assessment - take the simple Risk = Likelihood * Impact model, work out the cost of failure and this will give you a guide to how much money to sink into addressing the situation. So right now, the business impact seems to be very low. We don't expect the likelihood of failure to change, but in the future the impact will be higher; therefore the combined Risk will be higher, therefore they should invest in the production platform's resiliency at a rate that matches the business growth/dependency. Many people forget to review the Risk over time, and usually that results in an under-resourced system that will let them down when it fails. On the other hand, many people want "perfect from the start", and therefore spend too much on the initial solution. > I have already organised a test and a development environment with > separate hardware. It would be interesting if the test environment could be a full production cluster member when not being used for testing; that way you also get to regularly *test* how to change the cluster membership/state :-) Just like restores :-) > Yes, with multiple VMs that is the first thing that springs to mind, > replication in pairs with each VM responsible for its mirror. I guess > in the end it will be the most cost effective option, providing that I > can reduce everything to file system changes. Of which I'm not sure yet > at this stage. But if I can, looping a script will be interesting. I'm assuming that the host box is a Linux, but I'm not assuming anything about the VMs. However, if the VMs are also Linux, I'd question why you need to maintain them in such a separate and difficult-to-manage manner. You have to do OS updates on the host and on each VM ... and backups ... and everything. You might achieve better *replication* if the processes in these VMs were all on the same box. Containers might be more manageable than VMs, too. > If you or anyone else knows of such solution, I'm open to suggestion. Really sorry that I don't actually have "a solution" to offer though :-( -jim _______________________________________________ Linux-users mailing list [email protected] http://lists.canterbury.ac.nz/mailman/listinfo/linux-users
