> On Oct. 22, 2015, 9:48 a.m., Steven Hartland wrote: > > Quite a bit to go through in detail but unless I'm missing something in my > > initial pass my high level concern would be around TRIM causing IO > > saturation for leaf level vdevs. > > > > As you've mentioned TRIM requests are quite heavy compared with read / > > writes and different devices perform significantly differently due to > > internal differences on how they a process free requests, be that TRIM, > > UNMAP, ZERO etc. > > > > When we added ZFS TRIM support to FreeBSD one of the main issues was > > ensuring TRIM throughput was kept at a rate that the didn't significantly > > impact read / write IO's. > > > > In order to do this we changed the initial fixed TXG backlog, which seems > > to be what you have here, to be more dynamic. > > > > We now take into account the following factors per leaf vdev: > > 1. TXG max delay (trim_txg_delay - default 32) > > 2. Time max delay (trim_timeout - default 30s) > > 2. Max pending segments (trim_vdev_max_pending - default 10k) > > 3. Max interval between queue processing (trim_max_interval - default 1s) > > > > In addition we issue TRIM IOs using a different priority which has: > > 1. min active = 1 > > 2. max active = 64 > > Saso Kiselkov wrote: > Not sure there is much of an analogy between FreeBSD's implementation and > this one here. The primary reason why I didn't try to do any kind of > intelligent queueing and delaying is because it is largely unpredictable. > Unlike read/write, which has a more-or-less linear load based on I/O size, > TRIM is completely different. It can complete immediately or take several > seconds, depending on whether the underlying device needs to unmap lots of > LBAs or none. We don't know this, due to the underlying block remapping that > many devices do and won't tell us about. I'm not opposed to thinking about > ways of doing it, it's just that I haven't been able to come up with a > suitably generic algorithm that doesn't suck on some devices and excels on > others. > If I could preuse your knowledge of FreeBSD's implementation though, I'd > like to ask how FreeBSD *actually* issues TRIM/UNMAP commands. AFAICT, the > upper layers just do single-extent BIO_DELETE to GELI and then some > underlying magic happens which I'm not quite clear about. Most importantly, > I'm very curious as to whether FreeBSD somehow intelligently re-aggregates > issued extents and bunches them up into a single command, or issues one > extent per command. I'm positive Linux does the latter, which is horridly > inefficient and if FreeBSD does the same, it's where 90% of your TRIM/UNMAP > performance suckage comes from.
Yes if your doing manual TRIM passes on data which has already been TRIMM'ed, via auto for example, then some devices do indeed return much quicker as they skip the physical request internally as an optimisation; however in the general automatic case without TRIM throughput management you'll almost certainly see noticeable read / write stalls. FreeBSD has the two max size variables supported by our disk GEOM provider: d_maxsize and d_delmaxsize, the latter I added to avoid the situation you describe above, allowing for larger individual delete requests to be passed from the file system layer through to the IO layer. The value for d_delmaxsize defaults to d_maxsize but can and is overridden by CAM and others for devices that support exposing max delete values. On top of this we the device GEOM provider limit splits incoming delete requests to a global max. This has be turned based on experience to provide a good blend of delete throughput and interactivity e.g. ability to cancel a high level delete request, by splitting it into smaller kernel level requests. We also have aggregation for TRIM and UMAP requests within CAM layer which combines individual requests from higher layers into a single ATA / SCSI requests for performance, however this is pretty light weight. The main requirement driving throughput management is mitigating the performance issues caused by slow device response times and not slow system processing. We found that if you let delete requests build up to much you end up with the underlying device spending excessive time servicing delete's vs normal traffic, however if you issue deletes in too smaller groups your device delete throughput is penalised excessively so its important to strike a good balance within all components of the stack. With regards GELI (FreeBSD's encrypted device provider) the last time I looked it didn't support BIO_DELETE however there was some work happening in that area, so think you may be refering to GEOM? Hope this helps. - Steven ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.csiden.org/r/263/#review849 ----------------------------------------------------------- On Oct. 31, 2015, 2:52 a.m., Saso Kiselkov wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.csiden.org/r/263/ > ----------------------------------------------------------- > > (Updated Oct. 31, 2015, 2:52 a.m.) > > > Review request for OpenZFS Developer Mailing List and Christopher Siden. > > > Repository: illumos-gate > > > Description > ------- > > Adds support for issuing abstract DKIOCFREE ioctls from ZFS and adds support > for translating these ioctls into SCSI UNMAP commands to sd. > This is an upstream of work by Nexenta. > > > Diffs > ----- > > usr/src/uts/common/sys/sysevent/eventdefs.h > 9c6907a08af65665cdb09588c3b0ef89f087d70c > usr/src/uts/common/sys/scsi/targets/sddef.h > 39c0ed9d0fb2c2d6c20fa793c3b5f9168a844552 > usr/src/uts/common/sys/fs/zfs.h bc9f057dd1361ae73a12375515abacd0fed820d2 > usr/src/uts/common/sys/dkioc_free_util.h PRE-CREATION > usr/src/uts/common/sys/Makefile 39288d5cc0dd2b025f8e8661664f3e77c9fa9272 > usr/src/uts/common/os/dkioc_free_util.c PRE-CREATION > usr/src/uts/common/io/scsi/targets/sd.c > ae1e7e0fc3e51957dd66158c960251550ed9890d > usr/src/uts/common/io/comstar/lu/stmf_sbd/stmf_sbd.h > efbc7268ea7aab11b1d726551058d38e71bf376d > usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd.c > e8a4b131380376ea0dc028c01a6350cb438ecff2 > usr/src/uts/common/fs/zfs/sys/zio_impl.h > 08f820103e823681031100c8b2f65f8661e8293e > usr/src/uts/common/fs/zfs/sys/zio.h > 877e2839bf804ca3625c907367fed4a06744ea52 > usr/src/uts/common/fs/zfs/sys/vdev_impl.h > 17a18a319934908190409a0eeb5b18ff83b9e001 > usr/src/uts/common/fs/zfs/sys/vdev.h > 08ce54b18c78a6f5b5a59658e46655e66007aa5c > usr/src/uts/common/fs/zfs/sys/spa_impl.h > 441800198215e53fec306f6e5246642f9076d8e4 > usr/src/uts/common/fs/zfs/sys/metaslab_impl.h > 27a53b515fbc48ab5b200e88259df91cb6effe19 > usr/src/uts/common/fs/zfs/sys/metaslab.h > b3b9374c779f2460ca001577a84ade28210cc7ce > usr/src/uts/common/fs/zfs/zvol.c 585500bbac86d23fb82eea7ed5e3e7d3970a25e5 > usr/src/uts/common/fs/zfs/zio.c 4a3dafac9f7f391a9f16c06932420f655dde61f0 > usr/src/uts/common/fs/zfs/vdev_root.c > a5442a55eb5a23fbd6f5cf76f8b348670ad2e621 > usr/src/uts/common/fs/zfs/vdev_raidz.c > 085d1250a1ad7bef0ab4a1a2e92b90eb308ee455 > usr/src/uts/common/fs/zfs/vdev_missing.c > 228757334234d241f980058397438d3a80716dcf > usr/src/uts/common/fs/zfs/vdev_mirror.c > 8749e539f46682f3bb073fbf119004c6ecc64177 > usr/src/uts/common/fs/zfs/vdev_disk.c > ed4a8b773bf4d9a5f19a9b679bd393729ff529cc > usr/src/uts/common/fs/zfs/vdev.c 1c57fce4dcee909b164353181dcd8e2a29ed7946 > usr/src/uts/common/fs/zfs/spa_misc.c > 6f255df85025d95675d5c4750ab762c7258b7b90 > usr/src/uts/common/fs/zfs/spa.c 95a6b0fae7760e8a1e8cfc1e657dc22fd9ef3245 > usr/src/uts/common/fs/zfs/metaslab.c > 852534eff8bd8cb4351fe720e0b71bef255bbfaa > usr/src/uts/common/fs/zfs/dsl_scan.c > 53902793ef5ccce6e4c79d013df36bf381677ee9 > usr/src/uts/common/Makefile.files c90a5c1773bf91d2437fcbfa90da81a48070ce70 > usr/src/man/man1m/zpool.1m fbfd39357930d7550226ba7ca180042b96c28c1a > usr/src/lib/libzpool/common/sys/zfs_context.h > 9e4d8ed0b8ec42be75bb93f44602ac99e907cf00 > usr/src/lib/libzfs/common/mapfile-vers > dc72ab001049757ae5b6eac56716dbc954046f3c > usr/src/lib/libzfs/common/libzfs_pool.c > 3c992951793d2ef5d63c31348ec1b0754d5d6964 > usr/src/lib/libzfs/common/libzfs.h 1ebf520297365b423fce4635646533804d1eaad9 > usr/src/common/zfs/zpool_prop.c 4d906b02bc02e80a7b0aae7af66898b6e5d1ae79 > usr/src/cmd/zpool/zpool_main.c 18af5a2763039694d1c6acddb89f26aa7a2b12d1 > usr/src/uts/common/sys/dkio.h a5b0c312f9df59a7171778411ccaff654c5b27e8 > usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd_scsi.c > cb6e115fe949145d39865333353ef50baf49c7da > usr/src/uts/common/fs/zfs/zfs_ioctl.c > c863cbd399a88dd70edcd685fde6dadffbac3ff7 > usr/src/uts/common/fs/zfs/vdev_label.c > f0924ab1e66eaa678540da8925995da6e0e2a29c > usr/src/uts/common/fs/zfs/vdev_file.c > 5dfc331d20a480047e44a59bb504319b34440e17 > usr/src/uts/common/fs/zfs/sys/spa.h > 7ac78390338a44f7b7658017e1ae8fcc9beb89d6 > usr/src/uts/common/fs/zfs/sys/range_tree.h > 9f3ead537165f3a7b8c52fe58eedef66c1b1952e > usr/src/uts/common/fs/zfs/spa_config.c > 47bb59590893cb72ca7f2cee397566c7e466d6d4 > usr/src/uts/common/fs/zfs/range_tree.c > 6422fd1c1fa6ecf2a9283fefeabca772f6b0a76a > usr/src/pkg/manifests/system-header.mf > 4551ca095c0e25f47ddcb4e32d75c8eaaba0c1e4 > usr/src/lib/libzpool/Makefile.com da5da5d93682faf90a71e329f6345409513496dc > > Diff: https://reviews.csiden.org/r/263/diff/ > > > Testing > ------- > > Run on assortment of raidz, mirrors and straight vdevs. > > > Thanks, > > Saso Kiselkov > >
_______________________________________________ developer mailing list developer@open-zfs.org http://lists.open-zfs.org/mailman/listinfo/developer