Ok, weird problem,(s) if you want to call it that..
So i run a 10 OSD Ceph cluster on 4 hosts with SSDs (Intel DC3700) as journals.
I have a lot of mixed workloads running and the linux machines seem to get
somehow corrupted in a weird way and the performance kind of sucks.
First off:
All hosts are running Openstack with KVM + libvirt to connect and boot the RBD
volumes.
Ceph -v : ceph version 0.94.6
—————— Problem 1: Corruption:
Next, whenever I run fsck.ext4 -nvf /dev/vda1 on one of the guests I get this:
2fsck 1.42.9 (4-Feb-2014)
Warning! /dev/vda1 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 1647 has zero dtime. Fix? no
Inodes that were part of a corrupted orphan linked list found. Fix? no
Inode 133469 was part of the orphaned inode list. IGNORED.
Inode 133485 was part of the orphaned inode list. IGNORED.
Inode 133490 was part of the orphaned inode list. IGNORED.
Inode 133492 was part of the orphaned inode list. IGNORED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (8866035, counted=8865735).
Fix? no
Inode bitmap differences: -1647 -133469 -133485 -133490 -133492
Fix? no
Free inodes count wrong (2508840, counted=2509091).
Fix? no
cloudimg-rootfs: ********** WARNING: Filesystem still has errors **********
112600 inodes used (4.30%, out of 2621440)
70 non-contiguous files (0.1%)
77 non-contiguous directories (0.1%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 104372/41
1619469 blocks used (15.44%, out of 10485504)
0 bad blocks
2 large files
89034 regular files
14945 directories
55 character device files
25 block device files
1 fifo
16 links
8265 symbolic links (7832 fast symbolic links)
10 sockets
------------
112351 files
So I mount the disk via RBD on a host directly with rbd map
and when i do a fsck.ext4 -nfv /dev/rbd01p1
i get
fsck.ext4 /dev/rbd0p1
e2fsck 1.42.11 (09-Jul-2014)
cloudimg-rootfs: clean, 112600/2621440 files, 1619469/10485504 blocks
So which one do I trust??? I have had corrupted files on some of the images but
I accredited this due to a migration from qcow2 to RAW -> ceph.
Any help is really appreciated
———— > Problem 2: Performance
I would assume that even with the Intel DC SSDs as journals, I would get decent
performance out of the system. But currently I max this one out at 200MB/s
write while read is full 10Gbit/s
I have 10 SATA drives behind the SSDs using 2x 3 SATAs/SSD and 2x 2 SATA / SSD
fio is also giving terrible results, its like it cranks up the IO to about 5000
then dwindles down.. looks almost like its waiting to flush the SSDs out.. or
the IO
The only changes i made to the base config is rbd cache = true and then the
following lines:
ceph tell osd.* injectargs '--filestore_wbthrottle_enable=false'
ceph tell osd.* injectargs '--filestore_queue_max_bytes=1048576000'
ceph tell osd.* injectargs '--filestore_queue_committing_max_ops=5000'
ceph tell osd.* injectargs '--filestore_queue_committing_max_bytes=1048576000'
ceph tell osd.* injectargs '--filestore_queue_max_ops=200'
ceph tell osd.* injectargs '--journal_max_write_entries=1000'
ceph tell osd.* injectargs '--journal_queue_max_ops=3000’
Thats the only way I reached 200-250MB/s.. otherwise its more like 115MB/s also
waiting for flush after a wave..
Can anyone give me a fairly decent idea on how to tune this properly? also
could this modification have something to do with the corruption?
Thanks again for any help :)
//Florian
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com