passive san failover

Adam Goryachev Mon, 19 Sep 2016 04:05:17 -0700


On 19/09/2016 19:06, Marco Marino wrote:

2016-09-19 10:50 GMT+02:00 Igor Cicimov<[email protected] <mailto:[email protected]>>:
    On 19 Sep 2016 5:45 pm, "Marco Marino" <[email protected]
    <mailto:[email protected]>> wrote:
    >
    > Hi, I'm trying to build an active/passive cluster with drbd and
    pacemaker for a san. I'm using 2 nodes with one raid controller
    (megaraid) on each one. Each node has an ssd disk that works as
    cache for read (and write?) realizing the CacheCade proprietary
    tecnology.
    >
    Did you configure the CacheCade? If the write cache was enabled in
    write-back mode then suddenly removing the device from under the
    controller would have caused serious problems I guess since the
    controller expects to write to the ssd cache firts and then flush
    to the hdd's. Maybe this explains the read only mode?
Good point. It is exactly as you wrote. How can I mitigate thisbehavior in a clustered (active/passive) enviroment??? As I told inthe other post, I think the best solution is to poweroff the nodeusing local-io-error and switch all resources on the other node....But please give me some suggestions....

    > Basically, the structure of the san is:
    >
    > Physycal disks -> RAID -> Device /dev/sdb in the OS -> Drbd
    resource (that use /dev/sdb as backend) (using pacemaker with a
    master/slave resource) -> VG (managed with pacemaker) -> Iscsi
    target (with pacemaker) -> Iscsi LUNS (one for each logical volume
    in the VG, managed with pacemaker)
    >
    > Few days ago, the ssd disk was wrongly removed from the primary
    node of the cluster and this caused a lot of problems: drbd
    resource and all logical volumes went in readonly mode with a lot
    of I/O errors but the cluster did not switched to the other node.
    All filesystem on initiators went to readonly mode. There are 2
    problems involved here (I think): 1) Why removing the ssd disk
    cause a readonly mode with I/O errors? This means that the ssd is
    a single point of failure for a single node san with megaraid
    controllers and CacheCade tecnology..... and 2) Why drbd not
    worked as espected?
    What was the state in /proc/drbd ?

I think you will need to examine the logs to find out what happened. Itwould appear (just making a wild guess) that either the cache ishappening between DRBD and iSCSI instead of between DRBD and RAID. If ithappened under DRBD then DRBD should see the read/write error, andshould automatically fail the local storage. It wouldn't necessarilyfailover to the secondary, but it would do all read/write from thesecondary node. The fact this didn't happen makes it look like thefailure happened above DRBD.


At least that is my understanding of how it will work in that scenario.

Regards,
Adam

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Drbd/pacemaker active/passive san failover

Reply via email to