Hi,

Yes, I realised that you are correct in that it's not  twice as bad, it's just 
as bad. I did a trivial error when doing the math in my head which made the 
this case of erasure coding look worse than it is.


But, I still hold on to my previous statement: with m=1 you will lose data, it 
must not be used in production.​


--

  Eino Tuominen



________________________________
From: Jorge Pinilla López <[email protected]>
Sent: Wednesday, October 25, 2017 01:37
To: Eino Tuominen; [email protected]
Subject: Re: [ceph-users] Erasure Pool OSD fail


well, you should use M > 1, the more you have, less risk and more performance.

You don't read twice as much data, you read it from different sources, further 
more you can even read less data and have to rebuild it, because on erasure 
pools you don't replicate the data.


On the other hand, the configuration it's not as bad as you think, its just 
different.

3 nodes cluster

Replicate pool size = 2

    -you can take 1 failure, then re-balance and take another failure. (max 2 
separate)

    -you use 2*data space

    -you have to write 2*data, full data on one node and full data on the 
second one.

Erasure code pool

    -you can only lose 1 node

    -you use less space

    -as you dont write 2*data, writes are also faster. You write half data on 
one node, half data on the other and parity on separate nodes, write work is a 
lot more distributed.

    -reads are slower because you need all the data parts.


On both configurations, if you have corrupted data you lose your data, so 
that's not really a point to compare.

Replicate pool can achieve way more insensitive read works while Erasure pools 
are thought to perform big writes but really few reads.


I have check myself that both configurations can work with a 3 node cluster so 
it's not a better and a worse configuration, it really depend on your work, and 
the best thing :) you can have both in the same OSDs!

El 24/10/2017 a las 12:37, Eino Tuominen escribió:

Hello,


Correct me if I'm wrong, but isn't your configuration just twice as bad as 
running with replication size=2? With replication size=2 when you lose a disk 
you lose data if there is even one defect block found when ceph is 
reconstructing the pgs that had a replica on the failed disk. No, with your 
setup you have to be able to read twice as much data correctly in order to 
reconstruct the pgs. When using EC I think that you have to use m>1 in 
production.


--

  Eino Tuominen


________________________________
From: ceph-users 
<[email protected]><mailto:[email protected]> 
on behalf of Jorge Pinilla López <[email protected]><mailto:[email protected]>
Sent: Tuesday, October 24, 2017 11:24
To: [email protected]<mailto:[email protected]>
Subject: Re: [ceph-users] Erasure Pool OSD fail


Okay I think I can respond myself, the pool is created with a default min_size 
of 3, so when one of the OSDs goes down, the pool doenst perform any IO, 
manually changing the the pool min_size to 2 worked great.

El 24/10/2017 a las 10:13, Jorge Pinilla López escribió:
I am testing erasure code pools and doing a rados test write to try fault 
tolerace.
I have 3 Nodes with 1 OSD each, K=2 M=1.

While performing the write (rados bench -p replicate 100 write), I stop one of 
the OSDs daemons (example osd.0), simulating a node fail, and then the hole 
write stops and I can't write any data anymore.

    1      16        28        12   46.8121        48     1.01548    0.616034
    2      16        40        24   47.3907        48     1.04219    0.923728
    3      16        52        36   47.5889        48    0.593145      1.0038
    4      16        68        52   51.6633        64     1.39638     1.08098
    5      16        74        58    46.158        24     1.02699     1.10172
    6      16        83        67   44.4711        36     3.01542     1.18012
    7      16        95        79   44.9722        48    0.776493     1.24003
    8      16        95        79   39.3681         0           -     1.24003
    9      16        95        79   35.0061         0           -     1.24003
   10      16        95        79   31.5144         0           -     1.24003
   11      16        95        79   28.6561         0           -     1.24003
   12      16        95        79   26.2732         0           -     1.24003

Its pretty clear where the OSD failed

On the other hand, using a replicated pool, the client (rados test) doesnt even 
notice the OSD fail, which is awesome.

Is this a normal behaviour on EC pools?
________________________________
Jorge Pinilla López
[email protected]<mailto:[email protected]>
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: 
A34331932EBC715A<http://pgp.rediris.es:11371/pks/lookup?op=get&search=0xA34331932EBC715A>
________________________________



_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
________________________________
Jorge Pinilla López
[email protected]<mailto:[email protected]>
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: 
A34331932EBC715A<http://pgp.rediris.es:11371/pks/lookup?op=get&search=0xA34331932EBC715A>
________________________________

--
________________________________
Jorge Pinilla López
[email protected]<mailto:[email protected]>
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: 
A34331932EBC715A<http://pgp.rediris.es:11371/pks/lookup?op=get&search=0xA34331932EBC715A>
________________________________
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to