Re: HAST instability

Daniel Kalchev Tue, 31 May 2011 08:09:36 -0700

On 31.05.11 17:08, Mikolaj Golub wrote:

As I wrote privately, it would be nice to see both netstat and hast logs (from 
both nodes) for the same rather long period, when several cases occured. It 
would be good to place them somewere on web so other guys could access them 
too, as I will be offline for 7-10 days and will not be able to help you until 
I am back.

The test finished running for almost three hours, and so here is thecollected data:


(for the duration of test, on the secondary node)
systat -if
                    /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
     Load Average

      Interface           Traffic               Peak                Total
            lo0  in      0.000 KB/s          0.000 KB/s            1.126 KB
                 out     0.000 KB/s          0.000 KB/s            1.126 KB

            ix1  in      0.003 KB/s        230.590 MB/s          614.688 GB
                 out     0.054 KB/s          7.425 MB/s           19.910 GB

           igb0  in      0.025 KB/s          3.636 KB/s          566.897 KB
                 out     0.072 KB/s          4.296 KB/s            1.091 MB


The primary node is b1a, the secondary node is b1b.
kernel (built just after csup update):

FreeBSD b1a 8.2-STABLE FreeBSD 8.2-STABLE #1: Mon May 30 14:17:50 EEST2011 root@b1a:/usr/obj/usr/src/sys/GENERIC amd64


from primary
messages: http://news.digsys.bg/~admin/hast/test31may/b1a-messages
netstat -in: http://news.digsys.bg/~admin/hast/test31may/b1a-netstat -in
netstat-s: http://news.digsys.bg/~admin/hast/test31may/b1a-netstat-s

from secondary
messages: http://news.digsys.bg/~admin/hast/test31may/b1b-messages
netstat -in: http://news.digsys.bg/~admin/hast/test31may/b1b-netstat -in
netstat-s: http://news.digsys.bg/~admin/hast/test31may/b1b-netstat-s

  DK>  One additional note: while playing with this setup, I tried to
  DK>  simulate local disk going away in the hope HAST will switch to using
  DK>  the remote disk. Instead of asking someone at the site to pull out the
  DK>  drive, I just issued on the primary

  DK>  hastctl role init data0

  DK>  which resulted in kernel panic. Unfortunately, there was no sufficient
  DK>  dump space for 48GB. I will re-run this again with more drives for the
  DK>  crash dump. Anything you want me to look for in particular? (kernels
  DK>  have no KDB compiled in yet)

Well, removing physical disk (device /dev/gpt/data0 consumed by hastd
dissapears) and switching a resource to init role (devive /dev/hast/data0
consumed by FS dissapears) are two different things. Sure you should not
normally change the resource role (destroy hast device) before unmounting
(exporting) FS.

Then how do I proceed with a failed drive? Or a flaky drive that isstill visible to the OS, that I want to remove from HAST and replacewith a different one? How do I ask HAST to switch I/O to the secondary?Is there other way to get a drive out of HAST? In any case, even if thisis not allowed operation, it should not panic.


I am now going to reboot and run the same tests without checksums.

Daniel

_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Re: HAST instability

Reply via email to