Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up

Tony Harris Thu, 09 Jul 2015 07:33:08 -0700

Sounds to me like you've put yourself at too much risk - *if* I'm reading
your message right about your configuration, you have multiple hosts
accessing OSDs that are stored on a single shared box - so if that single
shared box (single point of failure for multiple nodes) goes down it's
possible for multiple replicas to disappear at the same time which could
halt the operation of your cluster if the masters and the replicas are both
on OSDs within that single shared storage system...


On Thu, Jul 9, 2015 at 5:42 AM, Mallikarjun Biradar <
[email protected]> wrote:

> Hi all,
>
> Setup details:
> Two storage enclosures each connected to 4 OSD nodes (Shared storage).
> Failure domain is Chassis (enclosure) level. Replication count is 2.
> Each host has allotted with 4 drives.
>
> I have active client IO running on cluster. (Random write profile with
> 4M block size & 64 Queue depth).
>
> One of enclosure had power loss. So all OSD's from hosts that are
> connected to this enclosure went down as expected.
>
> But client IO got paused. After some time enclosure & hosts connected
> to it came up.
> And all OSD's on that hosts came up.
>
> Till this time, cluster was not serving IO. Once all hosts & OSD's
> pertaining to that enclosure came up, client IO resumed.
>
>
> Can anybody help me why cluster not serving IO during enclosure
> failure. OR its a bug?
>
> -Thanks & regards,
> Mallikarjun Biradar
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up

Reply via email to