Thanks Christian,
I'm using a pool with size 3, min_size 1.
I can see the cluster serving I/O in a degraded after the OSD is marked
down, but the problem we have is in the interval between the OSD failure
event and the moment when that OSD is marked down.
In that interval (which can take up to 10 minutes) all the I/O
operations directed to that OSD are blocked, thus all the virtual
machines using the RBDs provided by the cluster hang, until the failed
OSD is finally marked down.
Is this the expected operation of the cluster during failure?
Is it possible to make that time shorter so the I/O operations don't get
blocked for so long?
Thanks,
On 11/04/2016 07:25 PM, Christian Wuerdig wrote:
What are your pool size and min_size settings? An object with less
than min_size replicas will not receive I/O
(http://docs.ceph.com/docs/jewel/rados/operations/pools/#set-the-number-of-object-replicas).
So if size=2 and min_size=1 then an OSD failure means blocked
operations to all objects located on the failed OSD until they have
been replicated again.
On Sat, Nov 5, 2016 at 9:04 AM, fcid <[email protected]
<mailto:[email protected]>> wrote:
Dear ceph community,
I'm working in a small ceph deployment for testing purposes, in
which i want to test the high availability features of Ceph and
how clients are affected during outages in the cluster.
This small cluster is deployed using 3 servers on which are
running 2 OSDs and 1 monitor each, and we are using it to serve
Rados block devices for KVM hypervisors in other hosts. The ceph
software was installed using ceph-deploy.
For HA testing we are simulating disk failures by physically
detaching OSD disks from servers and also by eliminating the power
source from servers we want to fail.
I have some doubts regarding the behavior during OSD and disk
failures under light workloads.
During disk failures, the cluster takes a long time to promote the
secondary OSD to primary, thus blocking all the disk operations of
virtual machines using RBD until the cluster map is updated with
the failed OSD (which can take up to 10 minutes in our cluster).
Is this the expected behavior of the OSD cluster? or should it be
transparent to clients when the disks fails?
Thanks in advance, kind regards.
Configuration and version of our ceph cluster:
root@ceph00:~# cat /etc/ceph/ceph.conf
[global]
fsid = 440fce60-3097-4f1c-a489-c170e65d8e09
mon_initial_members = ceph00
mon_host = 192.168.x1.x1
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 192.168.x.x/x
cluster network = y.y.y.y/y
[osd]
osd mkfs options = -f -i size=2048 -n size=64k
osd mount options xfs = inode64,noatime,logbsize=256k
osd journal size = 20480
filestore merge threshold = 40
filestore split multiple = 8
filestore xattr use omap = true
root@ceph00:~# ceph -v
ceph version 10.2.3
--
Fernando Cid O.
Ingeniero de Operaciones
AltaVoz S.A.
http://www.altavoz.net
Viña del Mar, Valparaiso:
2 Poniente 355 of 53
+56 32 276 8060 <tel:%2B56%2032%20276%208060>
Santiago:
San Pío X 2460, oficina 304, Providencia
+56 2 2585 4264 <tel:%2B56%202%202585%204264>
_______________________________________________
ceph-users mailing list
[email protected] <mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
--
Fernando Cid O.
Ingeniero de Operaciones
AltaVoz S.A.
http://www.altavoz.net
Viña del Mar, Valparaiso:
2 Poniente 355 of 53
+56 32 276 8060
Santiago:
San Pío X 2460, oficina 304, Providencia
+56 2 2585 4264
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com