Hi Wido,

no I did not set special flags - I've used ceph-deploy without further
parameters apart from the journal disk/partition that these OSDs should use.

Bernhard

Wido den Hollander <[email protected]> schrieb am Mo., 13. Feb. 2017 um
17:47 Uhr:

>
> > Op 13 februari 2017 om 16:49 schreef "Bernhard J. M. Grün" <
> [email protected]>:
> >
> >
> > Hi,
> >
> > we are using SMR disks for backup purposes in our Ceph cluster.
> > We have had massive problems with those disks prior to upgrading to
> Kernel
> > 4.9.x. We also dropped XFS as filesystem and we now use btrfs (only for
> > those disks).
> > Since we did this we don't have such problems anymore.
> >
>
> We have kernel 4.9 there, but XFS is not SMR-aware so it doesn't help.
>
> I saw posts that some XFS work is on it's way, but it's not being actively
> developed. What I saw however is that you need to issue some flags on mkfs.
>
> Did you need to do that when formatting btrfs on the SMR disks?
>
> Wido
>
> > If you don't like btrfs you could try to use a journal disk for XFS
> itself
> > and also a journal disk for Ceph. I assume this will also solve many
> > problems as the XFS journal is rewritten often and SMR disks don't like
> > rewrites.
> > I think that is one reason why btrfs works smoother with those disks.
> >
> > Hope this helps
> >
> > Bernhard
> >
> > Wido den Hollander <[email protected]> schrieb am Mo., 13. Feb. 2017 um
> > 16:11 Uhr:
> >
> > >
> > > > Op 13 februari 2017 om 15:57 schreef Peter Maloney <
> > > [email protected]>:
> > > >
> > > >
> > > > Then you're not aware of what the SMR disks do. They are just slow
> for
> > > > all writes, having to read the tracks around, then write it all again
> > > > instead of just the one thing you really wanted to write, due to
> > > > overlap. Then to partially mitigate this, they have some tiny write
> > > > buffer like 8GB flash, and then they use that for the "normal" speed,
> > > > and then when it's full, you crawl (at least this is what the seagate
> > > > ones do). Journals aren't designed to solve that... they help prevent
> > > > the sync load on the osd, but don't somehow make the throughput
> higher
> > > > (at least not sustained). Even if the journal was perfectly designed
> for
> > > > performance, it would still do absolutely nothing if it's full and
> the
> > > > disk is still busy with the old flushing.
> > > >
> > >
> > > Well, that explains indeed. I wasn't aware of the additional buffer
> inside
> > > a SMR disk.
> > >
> > > I was asked to look at this system for somebody who bought SMR disks
> > > without knowing. As I never touch these disks I found the behavior odd.
> > >
> > > The buffer explains it a lot better, wasn't aware that SMR disks have
> that.
> > >
> > > SMR shouldn't be used in Ceph without proper support in Bluestore or
> XFS
> > > aware SMR.
> > >
> > > Wido
> > >
> > > >
> > > > On 02/13/17 15:49, Wido den Hollander wrote:
> > > > > Hi,
> > > > >
> > > > > I have a odd case with SMR disks in a Ceph cluster. Before I
> continue,
> > > yes, I am fully aware of SMR and Ceph not playing along well, but
> there is
> > > something happening which I'm not able to fully explain.
> > > > >
> > > > > On a 2x replica cluster with 8TB Seagate SMR disks I can write with
> > > about 30MB/sec to each disk using a simple RADOS bench:
> > > > >
> > > > > $ rados bench -t 1
> > > > > $ time rados put 1GB.bin
> > > > >
> > > > > Both ways I found out that the disk can write at that rate.
> > > > >
> > > > > Now, when I start a benchmark with 32 threads it writes fine. Not
> > > super fast, but it works.
> > > > >
> > > > > After 15 minutes or so various disks go to 100% busy and just stay
> > > there. These OSDs are being marked as down and some even commit
> suicide due
> > > to threads timing out.
> > > > >
> > > > > Stopping the RADOS bench and starting the OSDs again resolves the
> > > situation.
> > > > >
> > > > > I am trying to explain what's happening. I'm aware that SMR isn't
> very
> > > good at Random Writes. To partially overcome this there are Intel DC
> 3510s
> > > in there as Journal SSDs.
> > > > >
> > > > > Can anybody explain why this 100% busy pops up after 15 minutes or
> so?
> > > > >
> > > > > Obviously it would the best if BlueStore had SMR support, but for
> now
> > > it's just Filestore with XFS on there.
> > > > >
> > > > > Wido
> > > > > _______________________________________________
> > > > > ceph-users mailing list
> > > > > [email protected]
> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > >
> > > >
> > > > _______________________________________________
> > > > ceph-users mailing list
> > > > [email protected]
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > _______________________________________________
> > > ceph-users mailing list
> > > [email protected]
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > --
> > Freundliche Grüße
> >
> > Bernhard J. M. Grün, Püttlingen, Deutschland
>
-- 
Freundliche Grüße

Bernhard J. M. Grün, Püttlingen, Deutschland
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to