Hi Wido, no I did not set special flags - I've used ceph-deploy without further parameters apart from the journal disk/partition that these OSDs should use.
Bernhard Wido den Hollander <[email protected]> schrieb am Mo., 13. Feb. 2017 um 17:47 Uhr: > > > Op 13 februari 2017 om 16:49 schreef "Bernhard J. M. Grün" < > [email protected]>: > > > > > > Hi, > > > > we are using SMR disks for backup purposes in our Ceph cluster. > > We have had massive problems with those disks prior to upgrading to > Kernel > > 4.9.x. We also dropped XFS as filesystem and we now use btrfs (only for > > those disks). > > Since we did this we don't have such problems anymore. > > > > We have kernel 4.9 there, but XFS is not SMR-aware so it doesn't help. > > I saw posts that some XFS work is on it's way, but it's not being actively > developed. What I saw however is that you need to issue some flags on mkfs. > > Did you need to do that when formatting btrfs on the SMR disks? > > Wido > > > If you don't like btrfs you could try to use a journal disk for XFS > itself > > and also a journal disk for Ceph. I assume this will also solve many > > problems as the XFS journal is rewritten often and SMR disks don't like > > rewrites. > > I think that is one reason why btrfs works smoother with those disks. > > > > Hope this helps > > > > Bernhard > > > > Wido den Hollander <[email protected]> schrieb am Mo., 13. Feb. 2017 um > > 16:11 Uhr: > > > > > > > > > Op 13 februari 2017 om 15:57 schreef Peter Maloney < > > > [email protected]>: > > > > > > > > > > > > Then you're not aware of what the SMR disks do. They are just slow > for > > > > all writes, having to read the tracks around, then write it all again > > > > instead of just the one thing you really wanted to write, due to > > > > overlap. Then to partially mitigate this, they have some tiny write > > > > buffer like 8GB flash, and then they use that for the "normal" speed, > > > > and then when it's full, you crawl (at least this is what the seagate > > > > ones do). Journals aren't designed to solve that... they help prevent > > > > the sync load on the osd, but don't somehow make the throughput > higher > > > > (at least not sustained). Even if the journal was perfectly designed > for > > > > performance, it would still do absolutely nothing if it's full and > the > > > > disk is still busy with the old flushing. > > > > > > > > > > Well, that explains indeed. I wasn't aware of the additional buffer > inside > > > a SMR disk. > > > > > > I was asked to look at this system for somebody who bought SMR disks > > > without knowing. As I never touch these disks I found the behavior odd. > > > > > > The buffer explains it a lot better, wasn't aware that SMR disks have > that. > > > > > > SMR shouldn't be used in Ceph without proper support in Bluestore or > XFS > > > aware SMR. > > > > > > Wido > > > > > > > > > > > On 02/13/17 15:49, Wido den Hollander wrote: > > > > > Hi, > > > > > > > > > > I have a odd case with SMR disks in a Ceph cluster. Before I > continue, > > > yes, I am fully aware of SMR and Ceph not playing along well, but > there is > > > something happening which I'm not able to fully explain. > > > > > > > > > > On a 2x replica cluster with 8TB Seagate SMR disks I can write with > > > about 30MB/sec to each disk using a simple RADOS bench: > > > > > > > > > > $ rados bench -t 1 > > > > > $ time rados put 1GB.bin > > > > > > > > > > Both ways I found out that the disk can write at that rate. > > > > > > > > > > Now, when I start a benchmark with 32 threads it writes fine. Not > > > super fast, but it works. > > > > > > > > > > After 15 minutes or so various disks go to 100% busy and just stay > > > there. These OSDs are being marked as down and some even commit > suicide due > > > to threads timing out. > > > > > > > > > > Stopping the RADOS bench and starting the OSDs again resolves the > > > situation. > > > > > > > > > > I am trying to explain what's happening. I'm aware that SMR isn't > very > > > good at Random Writes. To partially overcome this there are Intel DC > 3510s > > > in there as Journal SSDs. > > > > > > > > > > Can anybody explain why this 100% busy pops up after 15 minutes or > so? > > > > > > > > > > Obviously it would the best if BlueStore had SMR support, but for > now > > > it's just Filestore with XFS on there. > > > > > > > > > > Wido > > > > > _______________________________________________ > > > > > ceph-users mailing list > > > > > [email protected] > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > _______________________________________________ > > > > ceph-users mailing list > > > > [email protected] > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > > > ceph-users mailing list > > > [email protected] > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > > Freundliche Grüße > > > > Bernhard J. M. Grün, Püttlingen, Deutschland > -- Freundliche Grüße Bernhard J. M. Grün, Püttlingen, Deutschland
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
