Hi Tyler,

I suspect you have BlueStore DB/WAL at these drives as well, don't you?

Then perhaps you have performance issues with f[data]sync requests which DB/WAL invoke pretty frequently.

See the following links for details:

https://www.percona.com/blog/2018/02/08/fsync-performance-storage-devices/

https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

The latter link shows pretty poor numbers for M500DC drives.


Thanks,

Igor


On 12/11/2018 4:58 AM, Tyler Bishop wrote:

Older Crucial/Micron M500/M600
_____________________________________________

*Tyler Bishop*
EST 2007


O:513-299-7108 x1000
M:513-646-5809
http://BeyondHosting.net <http://beyondhosting.net/>


This email is intended only for the recipient(s) above and/or otherwise authorized personnel. The information contained herein and attached is confidential and the property of Beyond Hosting. Any unauthorized copying, forwarding, printing, and/or disclosing any information related to this email is prohibited. If you received this message in error, please contact the sender and destroy all copies of this email and any attachment(s).


On Mon, Dec 10, 2018 at 8:57 PM Christian Balzer <ch...@gol.com <mailto:ch...@gol.com>> wrote:

    Hello,

    On Mon, 10 Dec 2018 20:43:40 -0500 Tyler Bishop wrote:

    > I don't think thats my issue here because I don't see any IO to
    justify the
    > latency.  Unless the IO is minimal and its ceph issuing a bunch
    of discards
    > to the ssd and its causing it to slow down while doing that.
    >

    What does atop have to say?

    Discards/Trims are usually visible in it, this is during a fstrim of a
    RAID1 / :
    ---
    DSK |          sdb  | busy     81% |  read       0 | write  8587 
    | MBw/s 2323.4 |  avio 0.47 ms |
    DSK |          sda  | busy     70% |  read       2 | write  8587 
    | MBw/s 2323.4 |  avio 0.41 ms |
    ---

    The numbers tend to be a lot higher than what the actual interface is
    capable of, clearly the SSD is reporting its internal activity.

    In any case, it should give a good insight of what is going on
    activity
    wise.
    Also for posterity and curiosity, what kind of SSDs?

    Christian

    > Log isn't showing anything useful and I have most debugging
    disabled.
    >
    >
    >
    > On Mon, Dec 10, 2018 at 7:43 PM Mark Nelson <mnel...@redhat.com
    <mailto:mnel...@redhat.com>> wrote:
    >
    > > Hi Tyler,
    > >
    > > I think we had a user a while back that reported they had
    background
    > > deletion work going on after upgrading their OSDs from
    filestore to
    > > bluestore due to PGs having been moved around.  Is it possible
    that your
    > > cluster is doing a bunch of work (deletion or otherwise)
    beyond the
    > > regular client load?  I don't remember how to check for this
    off the top
    > > of my head, but it might be something to investigate.  If
    that's what it
    > > is, we just recently added the ability to throttle background
    deletes:
    > >
    > > https://github.com/ceph/ceph/pull/24749
    > >
    > >
    > > If the logs/admin socket don't tell you anything, you could
    also try
    > > using our wallclock profiler to see what the OSD is spending
    it's time
    > > doing:
    > >
    > > https://github.com/markhpc/gdbpmp/
    > >
    > >
    > > ./gdbpmp -t 1000 -p`pidof ceph-osd` -o foo.gdbpmp
    > >
    > > ./gdbpmp -i foo.gdbpmp -t 1
    > >
    > >
    > > Mark
    > >
    > > On 12/10/18 6:09 PM, Tyler Bishop wrote:
    > > > Hi,
    > > >
    > > > I have an SSD only cluster that I recently converted from
    filestore to
    > > > bluestore and performance has totally tanked. It was fairly
    decent
    > > > before, only having a little additional latency than
    expected.  Now
    > > > since converting to bluestore the latency is extremely high,
    SECONDS.
    > > > I am trying to determine if it an issue with the SSD's or
    Bluestore
    > > > treating them differently than filestore... potential garbage
    > > > collection? 24+ hrs ???
    > > >
    > > > I am now seeing constant 100% IO utilization on ALL of the
    devices and
    > > > performance is terrible!
    > > >
    > > > IOSTAT
    > > >
    > > > avg-cpu:  %user   %nice %system %iowait %steal   %idle
    > > >            1.37    0.00    0.34   18.59 0.00   79.70
    > > >
    > > > Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
    > > > avgrq-sz avgqu-sz   await r_await w_await svctm  %util
    > > > sda               0.00     0.00    0.00 9.50  0.00    64.00
    > > > 13.47     0.01    1.16    0.00    1.16  1.11  1.05
    > > > sdb               0.00    96.50    4.50   46.50 34.00 11776.00
    > > >  463.14   132.68 1174.84  782.67 1212.80 19.61 100.00
    > > > dm-0              0.00     0.00    5.50  128.00 44.00  8162.00
    > > >  122.94   507.84 1704.93  674.09 1749.23  7.49 100.00
    > > >
    > > > avg-cpu:  %user   %nice %system %iowait %steal   %idle
    > > >            0.85    0.00    0.30   23.37 0.00   75.48
    > > >
    > > > Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
    > > > avgrq-sz avgqu-sz   await r_await w_await svctm  %util
    > > > sda               0.00     0.00    0.00 3.00  0.00    17.00
    > > > 11.33     0.01    2.17    0.00    2.17  2.17  0.65
    > > > sdb               0.00    24.50    9.50   40.50 74.00 10000.00
    > > >  402.96    83.44 2048.67 1086.11 2274.46 20.00 100.00
    > > > dm-0              0.00     0.00   10.00   33.50 78.00  2120.00
    > > >  101.06   287.63 8590.47 1530.40 10697.96 22.99 100.00
    > > >
    > > > avg-cpu:  %user   %nice %system %iowait %steal   %idle
    > > >            0.81    0.00    0.30   11.40 0.00   87.48
    > > >
    > > > Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
    > > > avgrq-sz avgqu-sz   await r_await w_await svctm  %util
    > > > sda               0.00     0.00    0.00 6.00  0.00    40.25
    > > > 13.42     0.01    1.33    0.00    1.33  1.25  0.75
    > > > sdb               0.00   314.50   15.50  72.00  122.00 17264.00
    > > >  397.39    61.21 1013.30  740.00 1072.13 11.41  99.85
    > > > dm-0              0.00     0.00   10.00  427.00 78.00 27728.00
    > > >  127.26   224.12  712.01 1147.00  701.82  2.28 99.85
    > > >
    > > > avg-cpu:  %user   %nice %system %iowait %steal   %idle
    > > >            1.22    0.00    0.29    4.01 0.00   94.47
    > > >
    > > > Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
    > > > avgrq-sz avgqu-sz   await r_await w_await svctm  %util
    > > > sda               0.00     0.00    0.00 3.50  0.00    17.00
    > > >  9.71     0.00    1.29    0.00    1.29  1.14  0.40
    > > > sdb               0.00     0.00    1.00  39.50  8.00 10112.00
    > > >  499.75    78.19 1711.83 1294.50 1722.39 24.69 100.00
    > > >
    > > >
    > > > _______________________________________________
    > > > ceph-users mailing list
    > > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    > > _______________________________________________
    > > ceph-users mailing list
    > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    > >


-- Christian Balzer        Network/Systems Engineer
    ch...@gol.com <mailto:ch...@gol.com>           Rakuten Communications


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to