Le 11/07/2016 11:56, Brad Hubbard a écrit :
> On Mon, Jul 11, 2016 at 7:18 PM, Lionel Bouton
> <lionel-subscript...@bouton.name> wrote:
>> Le 11/07/2016 04:48, 한승진 a écrit :
>>> Hi cephers.
>>>
>>> I need your help for some issues.
>>>
>>> The ceph cluster version is Jewel(10.2.1), and the filesytem is btrfs.
>>>
>>> I run 1 Mon and 48 OSD in 4 Nodes(each node has 12 OSDs).
>>>
>>> I've experienced one of OSDs was killed himself.
>>>
>>> Always it issued suicide timeout message.
>> This is probably a fragmentation problem : typical rbd access patterns
>> cause heavy BTRFS fragmentation.
> To the extent that operations take over 120 seconds to complete? Really?

Yes, really. I had these too. By default Ceph/RBD uses BTRFS in a very
aggressive way, rewriting data all over the place and creating/deleting
snapshots every filestore sync interval (5 seconds max by default IIRC).

As I said there are 3 main causes of performance degradation :
- the snapshots,
- the journal in a standard copy-on-write file (move it out of the FS or
use NoCow),
- the weak auto defragmentation of BTRFS (autodefrag mount option).

Each one of them is enough to impact or even destroy performance in the
long run. The 3 combined make BTRFS unusable by default. This is why
BTRFS is not recommended : if you want to use it you have to be prepared
for some (heavy) tuning. The first 2 points are easy to address, for the
last (which begins to be noticeable when you accumulate rewrites on your
data) I'm not aware of any other tool than the one we developed and
published on github (link provided in previous mail).

Another thing : you better have a recent 4.1.x or 4.4.x kernel on your
OSDs if you use BTRFS. We've used it since 3.19.x but I wouldn't advise
it now and would recommend 4.4.x if it's possible for you and 4.1.x
otherwise.

Best regards,

Lionel
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to