On 19 Sep 2016 5:45 pm, "Marco Marino" <marino....@gmail.com> wrote:
> Hi, I'm trying to build an active/passive cluster with drbd and pacemaker
for a san. I'm using 2 nodes with one raid controller (megaraid) on each
one. Each node has an ssd disk that works as cache for read (and write?)
realizing the CacheCade proprietary tecnology.
Did you configure the CacheCade? If the write cache was enabled in
write-back mode then suddenly removing the device from under the controller
would have caused serious problems I guess since the controller expects to
write to the ssd cache firts and then flush to the hdd's. Maybe this
explains the read only mode?

> Basically, the structure of the san is:
> Physycal disks -> RAID -> Device /dev/sdb in the OS -> Drbd resource
(that use /dev/sdb as backend) (using pacemaker with a master/slave
resource) -> VG (managed with pacemaker) -> Iscsi target (with pacemaker)
-> Iscsi LUNS (one for each logical volume in the VG, managed with
> Few days ago, the ssd disk was wrongly removed from the primary node of
the cluster and this caused a lot of problems: drbd resource and all
logical volumes went in readonly mode with a lot of I/O errors but the
cluster did not switched to the other node. All filesystem on initiators
went to readonly mode. There are 2 problems involved here (I think): 1) Why
removing the ssd disk cause a readonly mode with I/O errors? This means
that the ssd is a single point of failure for a single node san with
megaraid controllers and CacheCade tecnology..... and 2) Why drbd not
worked as espected?
What was the state in /proc/drbd ?

> For point 1) I'm checking with the vendor and I doubt that I can do
> For point 2) I have errors in the drbd configuration. My idea is that
when an I/O error happens on the primary node, the cluster should switch to
the secondary node and shut down the damaged node.
> Here -> http://pastebin.com/79dDK66m it is possible to see the actual
drbd configuration, but I need to change a lot of things and I want to
share my ideas here:
> 1) The "handlers" section should be moved in the "common" section of
global_common.conf and not in the resource file.
> 2)I'm thinking to modify the "handlers" section as follow:
> handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
halt -f";   # Hook into Pacemaker's fencing. fence-peer
"/usr/lib/drbd/crm-fence-peer.sh"; }
> In this way, when an I/O error happens, the node will be powered off and
pacemaker will switch resources to the other node (or at least doesn't
create problematic behaviors...)
> 3) I'm thinking to move the "fencing" directive from the resource to the
global_common.conf file. Furthermore, I want to change it to
> fencing resource-and-stonith;
> 4) Finally, in the global "net" section I need to add:
> after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
> At the end of the work configuration will be ->
> Please, give me suggestion about mistakes and possible changes.
> Thank you
> _______________________________________________
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
drbd-user mailing list

Reply via email to