On Jun 19, 2009, at 5:55 PM, [email protected] wrote:
> I claim that it's not possible for a two servers to have the same  
> state. or at least not with acceptable performance.

You can claim that, but I think there's evidence to the contrary.  
VMotion works by getting the target-host up to speed, and matching its  
state in lock step, including changes since that portion of the  
"state" was migrated, etc., etc. We used to VMotion our web servers  
from host to host all the time, and we measure our responsiveness, per- 
request, down to the millisecond. We never noticed any performance  
issue before/during/after a vmotion event.

> even if you limit your definition of 'state' to the contents of the  
> logical disks, you can't have real-time replication of disk contents  
> between machines at anything close to the same speed that you can  
> make changes to the local disk.

In the usual configuration, the disks are "the same"... ie., both the  
"live" and "slave" simply look at the same SAN/NAS mount. So there's  
no issue there.

> these bandwidth limits mean that even 'live migration' doesn't mean  
> zero outage, at some point you need to pause the VM on one machine  
> to copy the last of the changes to the new machine and start it up.  
> vmware takes advantage of the fact that most memory/disk id not  
> normally changed, so it copies everything, and then goes back and  
> copies everything that has changed, until it decides that it's not  
> making sufficiant progress, at which time it must pause the app to  
> move it. This normally makes the outage small enough that most  
> people don't notice it, but it's not a case of 'not a moment of  
> outage' as another poster in this thread commented.

One could, I presume, make the claim that ESX migration would not be  
sufficient for RTOS purposes, but for the majority of folks out there,  
including high-I/O, high-memory usages, the impact of the migration- 
pause is completely unnoticeable.

> and at some point simple bandwidth and latency (speed of light)  
> limits mean that you can't eliminate all downtime.

And I think for MOST organizations, the level of "risk" you're  
describing is not one which we encounter. :-)

Cheers,
D

_______________________________________________
Discuss mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to