On Tue, Mar 13, 2018 at 2:56 PM David C <dcsysengin...@gmail.com> wrote:

> Thanks for the detailed response, Greg. A few follow ups inline:
>
>
> On 13 Mar 2018 20:52, "Gregory Farnum" <gfar...@redhat.com> wrote:
>
> On Tue, Mar 13, 2018 at 12:17 PM, David C <dcsysengin...@gmail.com> wrote:
> > Hi All
> >
> > I have a Samba server that is exporting directories from a Cephfs Kernel
> > mount. Performance has been pretty good for the last year but users have
> > recently been complaining of short "freezes", these seem to coincide with
> > MDS related slow requests in the monitor ceph.log such as:
> >
> >> 2018-03-13 13:34:58.461030 osd.15 osd.15 10.10.10.211:6812/13367 5752 :
> >> cluster [WRN] slow request 31.834418 seconds old, received at 2018-03-13
> >> 13:34:26.626474: osd_repop(mds.0.5495:810644 3.3e e14085/14019
> >> 3:7cea5bac:::10001a88b8f.00000000:head v 14085'846936) currently
> commit_sent
> >> 2018-03-13 13:34:59.461270 osd.15 osd.15 10.10.10.211:6812/13367 5754 :
> >> cluster [WRN] slow request 32.832059 seconds old, received at 2018-03-13
> >> 13:34:26.629151: osd_repop(mds.0.5495:810671 2.dc2 e14085/14020
> >> 2:43bdcc3f:::10001e91a91.00000000:head v 14085'21394) currently
> commit_sent
> >> 2018-03-13 14:23:57.409427 osd.30 osd.30 10.10.10.212:6824/14997 5708 :
> >> cluster [WRN] slow request 30.536832 seconds old, received at 2018-03-13
> >> 14:23:26.872513: osd_repop(mds.0.5495:865403 2.fb6 e14085/14077
> >> 2:6df955ef:::10001e93542.000000c4:head v 14085'21296) currently
> commit_sent
> >> 2018-03-13 14:23:57.409449 osd.30 osd.30 10.10.10.212:6824/14997 5709 :
> >> cluster [WRN] slow request 30.529640 seconds old, received at 2018-03-13
> >> 14:23:26.879704: osd_repop(mds.0.5495:865407 2.595 e14085/14019
> >> 2:a9a56101:::10001e93542.000000c8:head v 14085'20437) currently
> commit_sent
> >> 2018-03-13 14:23:57.409453 osd.30 osd.30 10.10.10.212:6824/14997 5710 :
> >> cluster [WRN] slow request 30.503138 seconds old, received at 2018-03-13
> >> 14:23:26.906207: osd_repop(mds.0.5495:865423 2.ea e14085/14055
> >> 2:57096bbf:::10001e93542.000000d8:head v 14085'21147) currently
> commit_sent
>
> Well, that means your OSDs are getting operations that commit quickly
> to a journal but are taking a while to get into the backing
> filesystem. (I assume this is on filestore based on that message
> showing up at all, but could be missing something.)
>
>
> Yep it's filestore. Journals are on Intel P3700 NVME, data and metadata
> pools both on 7200rpm SATA. Sounds like I might benefit from moving
> metadata to a dedicated SSD pool.
>
> In the meantime, are there any recommended tunables? Filestore max/min
> sync interval for example?
>

Well, you can try. I'm not sure what the most successful deployments look
like.

If you turn up the min sync interval you stand a better chance of only
doing one write to your HDD if files get overwritten, for instance. But it
may also mean that your commits end up taking so long that you get worse IO
stalls, if there's no opportunity to coalesce and the reality is just that
you're trying to push more IOs through the system than the backing HDDs can
support.
-Greg


>
> >
> >
> > --
> >
> > Looking in the MDS log, with debug set to 4, it's full of
> "setfilelockrule
> > 1" and "setfilelockrule 2":
> >
> >> 2018-03-13 14:23:00.446905 7fde43e73700  4 mds.0.server
> >> handle_client_request client_request(client.9174621:141162337
> >> setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 120,
> >> length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521
> caller_uid=1155,
> >> caller_gid=1131{}) v2
> >> 2018-03-13 14:23:00.447050 7fde43e73700  4 mds.0.server
> >> handle_client_request client_request(client.9174621:141162338
> >> setfilelockrule 2, type 4, owner 14971048137043556787, pid 4632, start
> 0,
> >> length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0,
> >> caller_gid=0{}) v2
> >> 2018-03-13 14:23:00.447258 7fde43e73700  4 mds.0.server
> >> handle_client_request client_request(client.9174621:141162339
> >> setfilelockrule 2, type 4, owner 14971048137043550643, pid 4632, start
> 0,
> >> length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0,
> >> caller_gid=0{}) v2
> >> 2018-03-13 14:23:00.447393 7fde43e73700  4 mds.0.server
> >> handle_client_request client_request(client.9174621:141162340
> >> setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 124,
> >> length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521
> caller_uid=1155,
> >> caller_gid=1131{}) v2
>
> And that is clients setting (and releasing) advisory locks on files. I
> don't think this should directly have anything to do with the slow OSD
> requests (file locking is ephemeral state, not committed to disk), but
> if you have new applications running which are taking file locks on
> shared files that could definitely impede other clients and slow
> things down more generally.
> -Greg
>
>
> Sounds like that could be a red herring then. It seems like my issue is
> users chucking lots of small writes at the cephfs mount.
>
>
> >
> >
> > --
> >
> > I don't have a particularly good monitoring set up on this cluster yet,
> but
> > a cursory look at a few things such as iostat doesn't seem to suggest
> OSDs
> > are being hammered.
> >
> > Some questions:
> >
> > 1) Can anyone recommend a way of diagnosing this issue?
> > 2) Are the multiple "setfilelockrule" per inode to be expected? I assume
> > this is something to do with the Samba oplocks.
>
> Hmm, you might be right about the oplocks. That's an output format
> error btw, it should be "setfilelock" (the op type) and a separate
> word " rule " (indicating the type of lock, 1 for shared and 2 for
> exclusive).
>
>
> > 3) What's the recommended highest MDS debug setting before performance
> > starts to be adversely affected (I'm aware log files will get huge)?
>
> There's not a good answer here. If you actually encounter a bug
> developers will want at least 10, and probably 20, but both of those
> have measurable performance impacts. :/
>
> > 4) What's the best way of matching inodes in the MDS log to the file
> names
> > in cephfs?
>
> If you have an actual log then before the inode gets used it has to
> get found, and that will be by path. You can also look at the xattrs
> on the object in rados; one of them contains a "backtrace" of the
> path.
> -Greg
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to