Re: Multi-machine mirroring choices

Sven W Mon, 21 Jul 2008 08:24:18 -0700


Pete French presumably uttered the following on 07/21/08 07:08:

The *big* issue I have right now is dealing with the slave machine going
down. Once the master no longer has a connection to the ggated devices,
all processes trying to use the device hang in D status. I have tried
pkill'ing ggatec to no avail and ggatec destroy returns a message of
gctl being busy. Trying to ggatec destroy -f panics the machine.


Oddly enough, this was the issue I had with iscsi which made me move
to using ggated instead. On our machines I use '-t 10' as an argument to
ggatec, and this makes it timeout once the connection has been down for
a certain amount of time. I am using gmirror on top, not ZFS, and this
handled the drive vanishing from the mirror quite happily. I haven't
tried it with ZFS, which may not like having the device suddenly dissapear.

-pete.

What I have found is that the master machine will lock up if the slave disappearsduring a large file transfer. I tested this by setting up zpool mirror on the masterusing a ggatec device from the slave. Then I:


pkill'ed ggated on the slave machine.

dd if=/dev/zero of=/data1/testfile2 bs=16k count=8192   [128MB] on the master

The dd command finished and the /var/log/messages showed I/O errors to the slavedrive as expected. Messages also showed ggatec trying to reconnect every 10 seconds(ggatec was started with the -t 10 parameter).

Finally zfs marked the drive unavailable which then allowed me to ggatec destroy -u0 without getting the "ioctl(/dev/ggctl): Device busy" error message. (By the way,using ggatec destroy does not kill the "ggatec create" that created the process tobegin with, I had to pkill ggatec to get that stop - bug?)

The above behavior would be acceptable for multi-machine mirroring as it would bescriptable.


The problem comes with Large writes. I tried to repeat the above with

dd if=/dev/zero of=/data1/testfile2 bs=16k count=32768 [512MB]

which then locks zfs, and ultimately the system itself. It seems once the writesize/buffer is full, zfs is unable to fail/unavail the slave drive and the entiresystem becomes unresponsive (cannot even ssh into it).

The bottom line is that without some type of "timeout" or "time to fail" (bad I/O tofail?) zpool + ggate[cd] seems to be an unworkable solution. This is actually ashame as the recover process swapping from master to slave and back again was somuch cleaner and faster than using gmirror.



Sven
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Multi-machine mirroring choices

Reply via email to