Hello,

Correct me if I'm wrong, but isn't your configuration just twice as bad as 
running with replication size=2? With replication size=2 when you lose a disk 
you lose data if there is even one defect block found when ceph is 
reconstructing the pgs that had a replica on the failed disk. No, with your 
setup you have to be able to read twice as much data correctly in order to 
reconstruct the pgs. When using EC I think that you have to use m>1 in 
production.


--

  Eino Tuominen


________________________________
From: ceph-users <[email protected]> on behalf of Jorge Pinilla 
López <[email protected]>
Sent: Tuesday, October 24, 2017 11:24
To: [email protected]
Subject: Re: [ceph-users] Erasure Pool OSD fail


Okay I think I can respond myself, the pool is created with a default min_size 
of 3, so when one of the OSDs goes down, the pool doenst perform any IO, 
manually changing the the pool min_size to 2 worked great.

El 24/10/2017 a las 10:13, Jorge Pinilla López escribió:
I am testing erasure code pools and doing a rados test write to try fault 
tolerace.
I have 3 Nodes with 1 OSD each, K=2 M=1.

While performing the write (rados bench -p replicate 100 write), I stop one of 
the OSDs daemons (example osd.0), simulating a node fail, and then the hole 
write stops and I can't write any data anymore.

    1      16        28        12   46.8121        48     1.01548    0.616034
    2      16        40        24   47.3907        48     1.04219    0.923728
    3      16        52        36   47.5889        48    0.593145      1.0038
    4      16        68        52   51.6633        64     1.39638     1.08098
    5      16        74        58    46.158        24     1.02699     1.10172
    6      16        83        67   44.4711        36     3.01542     1.18012
    7      16        95        79   44.9722        48    0.776493     1.24003
    8      16        95        79   39.3681         0           -     1.24003
    9      16        95        79   35.0061         0           -     1.24003
   10      16        95        79   31.5144         0           -     1.24003
   11      16        95        79   28.6561         0           -     1.24003
   12      16        95        79   26.2732         0           -     1.24003

Its pretty clear where the OSD failed

On the other hand, using a replicated pool, the client (rados test) doesnt even 
notice the OSD fail, which is awesome.

Is this a normal behaviour on EC pools?
________________________________
Jorge Pinilla López
[email protected]<mailto:[email protected]>
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: 
A34331932EBC715A<http://pgp.rediris.es:11371/pks/lookup?op=get&search=0xA34331932EBC715A>
________________________________



_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
________________________________
Jorge Pinilla López
[email protected]<mailto:[email protected]>
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: 
A34331932EBC715A<http://pgp.rediris.es:11371/pks/lookup?op=get&search=0xA34331932EBC715A>
________________________________
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to