Prakash, thanks for sharing your code. I am glad I am not the only one who has hit the issue! I will look at Zfsonlinux source tree for other changes as well. =Lakhinder
> Date: Mon, 23 Jun 2014 13:46:32 -0700 > From: [email protected] > To: [email protected] > CC: [email protected]; [email protected] > Subject: Re: [OpenZFS Developer] zfs metadata caching and back-to-back 'find' > commands > > On Sun, Jun 22, 2014 at 08:14:01PM -0700, Matthew Ahrens wrote: > > On Mon, Jun 16, 2014 at 12:42 PM, Lakhinder Walia <[email protected]> > > wrote: > > Hi. > > > > I have a FreeBSD 10.0-STABLE #0 r264414 based system with 32 GB RAM > > running ZFS/Geli. > > > > I would like to understand if the response pattern to these steps > > show a bug. > > > > Steps (details below): > > > > A. Boot the system. Create a large file, 5x the RAM, using dd. > > > > B. Run a 'find' on a zfs directory (tree) with approx 32 Million > > files. > > > > C. Run 'find' command _back-to-back_ for a total of 4 times. > > > > 'find' response time goes down from: > > 32 min -> 17 min -> 1+ min -> 17 seconds. > > (No other significant zfs i/o is going on at this time). > > > > Question: > > > > Q1. Why are the second and third 'find' back-to-back steps taking so > > long? > > MFU should have (AFAIK) held on to the metadata after the first > > 'find' > > operation was over. Shouldn't the subsequent 'find' have executed > > faster!? > > (There is no other i/o intensive or job running. No NFS clients are > > present in the tests, even though NFS sharing is configured.) > > > > Have you looked at how much I/O and CPU are used in each of the 4 runs? > > This > > would help identify if the problem is that we are not caching the data in > > the > > ARC, vs. if some other data structure is being torn down and re-created > > (e.g. > > vnode_t, znode_t, dnode_t). I would guess that the 17 minute run is I/O > > bound, > > but the 1-minute run might be CPU-bound. > > > > If it is i/o bound, the next step would be to determine why the data is not > > cached after the first run. You might start by looking at the arc stats > > before > > and after each run. > > > > If it is CPU bound, a flame graph might quickly show the problem area.. > > http:// > > www.brendangregg.com/FlameGraphs/cpuflamegraphs.html > > > > It's possible that after the first run we are not using enough of the ARC to > > cache metadata (despite the fact that you've raised the metadata limit large > > enough). I recently did some work to address this issue, which is not yet > > integrated to illumos or FreeBSD. You can see it here: > > > > http://reviews.csiden.org/r/50 > > We've done work in ZoL to address this as well.. > > https://github.com/zfsonlinux/zfs/commit/da8ccd0e > > -- > Cheers, Prakash > > > > > > > Q2. I have another issue, though not detailed here. A ruby script > > doing a traversal of all the files via its 'Find' api gets to be very > > very slow, particularly after a new 'find' job kicks in, in-parallel > > due to some default system security policy: > > /etc/periodic/security/110.neggrpperm. When I kill the second 'find', > > the ruby script 'Find' based call continues _still_ at its very slow > > pace, for hours. A script that is just listing files could have > > finished in two hours instead takes approx 10 hours even if that is > > the > > only file i/o going on. > > > > Are both 'find' commands looking at the same directory? If not, the new > > 'find' > > command could be displacing some of the cache entries needed by the first > > command. > > > > Steps to analyze this are similar to what I mentioned above -- determine if > > is > > CPU or i/o bound, then what is causing that. > > > > > > Q3. Could the two issues above possibly related? > > > > I'm not sure. > > > > --matt > > > > > > Thanks, > > > > =Lakhinder > > > > > > > > Steps: ==============================================> > > > > # Step A. Freshly booted system. > > root@fbsd_10:/zdata_1/xx # > > $ dd if=/dev/zero of=file_2 bs=16M count=10240; date > > 10240+0 records in > > 10240+0 records out > > 171798691840 bytes transferred in 192.188421 secs (893907609 bytes/ > > sec) > > > > # Step B > > # cd over to another directory with approx 32 Million files > > # in its various subdirectories. > > root@fbsd_10:/zdata_1/sr_2gig # date; find . | wc; date > > Sun Jun 15 15:52:32 PDT 2014 > > 32673014 32673014 2378434006 > > Sun Jun 15 16:24:22 PDT 2014 > > # time taken is approx. 32 minutes. > > > > # Step C-1. > > root@fbsd_10:/zdata_1/sr_2gig # date; find . | wc; date > > Sun Jun 15 16:24:22 PDT 2014 > > 32673014 32673014 2378434006 > > Sun Jun 15 16:41:56 PDT 2014 > > # time taken is approx. 17 minutes. > > > > # Step C-2. > > root@fbsd_10:/zdata_1/sr_2gig # date; find . | wc; date > > Sun Jun 15 16:41:56 PDT 2014 > > 32673014 32673014 2378434006 > > Sun Jun 15 16:43:07 PDT 2014 > > # time taken is approx 1+ minute. > > > > # Step C-3. > > root@fbsd_10:/zdata_1/sr_2gig # date; find . | wc; date > > Sun Jun 15 16:43:07 PDT 2014 > > 32673014 32673014 2378434006 > > Sun Jun 15 16:43:26 PDT 2014 > > #time taken is 19 seconds. > > > > --------------- system details -------------------------------------- > > > > OS version: > > FreeBSD fbsd_10 10.0-STABLE FreeBSD 10.0-STABLE #0 r264414: > > Sun Apr 13 19:33:56 UTC 2014 > > [email protected]:/usr/obj/usr/src/sys/GENERIC amd64 > > > > cat /boot/loader.conf: > > aesni_load="YES" > > geli_load="YES" > > zfs_load="YES" > > vfs.zfs.arc_meta_limit=17179869184 > > # 16 GB limit for ARC Meta. > > kern.maxvnodes=50000000 > > > > > > Here is 'top' after running a _recent_ 'find'. > > > > last pid: 75231; load averages: 0.48, 0.28, 0.33 > > up 0+20:20:26 11:58:50 > > 37 processes: 1 running, 36 sleeping > > CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.9% > > idle > > Mem: 28M Active, 796K Inact, 30G Wired, 2620K Cache, 97M Buf, 911M > > Free > > ARC: 24G Total, 4756M MFU, 18G MRU, 16K Anon, 385M Header, 1308M > > Other > > Swap: 1638M Total, 41M Used, 1597M Free, 2% Inuse > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU > > COMMAND > > 2486 root 1 21 0 34980K 2236K nanslp 6 1:46 0.00% > > perl5.16.3 > > 2478 root 1 20 0 56300K 15720K select 3 1:14 0.00% > > tmux > > 2483 root 1 20 0 19824K 1612K CPU5 5 0:47 0.00% > > top > > **** > > > > # after the _recent_ refresh with 'find': > > root@fbsd_10:/ # sysctl vfs.zfs: > > > > vfs.zfs.arc_max: 32373334016 > > vfs.zfs.arc_min: 8589934592 > > vfs.zfs.arc_meta_used: 17180938800 > > vfs.zfs.arc_meta_limit: 17179869184 > > vfs.zfs.l2arc_write_max: 8388608 > > vfs.zfs.l2arc_write_boost: 8388608 > > vfs.zfs.l2arc_headroom: 2 > > vfs.zfs.l2arc_feed_secs: 1 > > vfs.zfs..l2arc_feed_min_ms: 200 > > vfs.zfs.l2arc_noprefetch: 1 > > vfs.zfs.l2arc_feed_again: 1 > > vfs.zfs.l2arc_norw: 1 > > vfs.zfs.anon_size: 16384 > > vfs.zfs.anon_metadata_lsize: 0 > > vfs.zfs.anon_data_lsize: 0 > > vfs.zfs.mru_size: 18898130944 > > vfs.zfs.mru_metadata_lsize: 10115874816 > > vfs.zfs.mru_data_lsize: 8479965184 > > vfs.zfs.mru_ghost_size: 6764165120 > > vfs.zfs.mru_ghost_metadata_lsize: 6764165120 > > vfs.zfs.mru_ghost_data_lsize: 0 > > vfs.zfs.mfu_size: 4986851840 > > vfs.zfs.mfu_metadata_lsize: 4631966208 > > vfs.zfs.mfu_data_lsize: 0 > > vfs.zfs.mfu_ghost_size: 2492798976 > > vfs.zfs.mfu_ghost_metadata_lsize: 2492798976 > > vfs.zfs.mfu_ghost_data_lsize: 0 > > vfs.zfs.l2c_only_size: 0 > > vfs.zfs.dedup.prefetch: 1 > > vfs.zfs.nopwrite_enabled: 1 > > vfs.zfs.mdcomp_disable: 0 > > vfs.zfs.prefetch_disable: 0 > > vfs.zfs.zfetch.max_streams: 8 > > vfs.zfs.zfetch.min_sec_reap: 2 > > vfs.zfs.zfetch.block_cap: 256 > > vfs.zfs.zfetch.array_rd_sz: 1048576 > > vfs.zfs.top_maxinflight: 32 > > vfs.zfs.resilver_delay: 2 > > vfs.zfs.scrub_delay: 4 > > vfs.zfs.scan_idle: 50 > > vfs.zfs.scan_min_time_ms: 1000 > > vfs.zfs.free_min_time_ms: 1000 > > vfs.zfs.resilver_min_time_ms: 3000 > > vfs.zfs.no_scrub_io: 0 > > vfs.zfs.no_scrub_prefetch: 0 > > vfs.zfs.metaslab.gang_bang: 131073 > > vfs.zfs.metaslab.debug_load: 0 > > vfs..zfs.metaslab.debug_unload: 0 > > vfs.zfs.metaslab.df_alloc_threshold: 131072 > > vfs.zfs.metaslab.df_free_pct: 4 > > vfs.zfs.metaslab.min_alloc_size: 10485760 > > vfs.zfs.metaslab.load_pct: 50 > > vfs.zfs.metaslab.unload_delay: 8 > > vfs.zfs.metaslab.preload_limit: 3 > > vfs.zfs..metaslab.preload_enabled: 1 > > vfs.zfs.metaslab.weight_factor_enable: 0 > > vfs.zfs.condense_pct: 200 > > vfs.zfs.mg_alloc_failures: 12 > > vfs.zfs.mg_noalloc_threshold: 0 > > vfs.zfs.write_to_degraded: 0 > > vfs.zfs.check_hostid: 1 > > vfs.zfs.recover: 0 > > vfs.zfs.deadman_synctime_ms: 1000000 > > vfs.zfs.deadman_checktime_ms: 5000 > > vfs.zfs.deadman_enabled: 0 > > vfs.zfs.spa_asize_inflation: 24 > > vfs.zfs.txg.timeout: 5 > > vfs.zfs.vdev.cache.max: 16384 > > vfs.zfs.vdev.cache.size: 0 > > vfs.zfs.vdev.cache.bshift: 16 > > vfs.zfs.vdev.trim_on_init: 1 > > vfs.zfs.vdev.max_active: 1000 > > vfs.zfs.vdev.sync_read_min_active: 10 > > vfs.zfs.vdev.sync_read_max_active: 10 > > vfs.zfs.vdev.sync_write_min_active: 10 > > vfs.zfs.vdev.sync_write_max_active: 10 > > vfs.zfs.vdev.async_read_min_active: 1 > > vfs.zfs.vdev.async_read_max_active: 3 > > vfs.zfs.vdev.async_write_min_active: 1 > > vfs.zfs.vdev.async_write_max_active: 10 > > vfs.zfs.vdev.scrub_min_active: 1 > > vfs.zfs.vdev.scrub_max_active: 2 > > vfs.zfs.vdev.aggregation_limit: 131072 > > vfs.zfs.vdev.read_gap_limit: 32768 > > vfs.zfs.vdev.write_gap_limit: 4096 > > vfs.zfs.vdev.bio_flush_disable: 0 > > vfs.zfs.vdev.bio_delete_disable: 0 > > vfs.zfs.vdev.trim_max_bytes: 2147483648 > > vfs.zfs.vdev.trim_max_pending: 64 > > vfs.zfs.max_auto_ashift: 13 > > vfs.zfs..zil_replay_disable: 0 > > vfs.zfs.cache_flush_disable: 0 > > vfs.zfs.zio.use_uma: 1 > > vfs.zfs.zio.exclude_metadata: 0 > > vfs.zfs.sync_pass_deferred_free: 2 > > vfs.zfs.sync_pass_dont_compress: 5 > > vfs..zfs.sync_pass_rewrite: 2 > > vfs.zfs.snapshot_list_prefetch: 0 > > vfs.zfs.super_owner: 0 > > vfs.zfs.debug: 0 > > vfs.zfs.version.ioctl: 3 > > vfs.zfs.version.acl: 1 > > vfs.zfs.version.spa: 5000 > > vfs.zfs.version.zpl: 5 > > vfs.zfs.trim.enabled: 1 > > vfs.zfs.trim.txg_delay: 32 > > vfs.zfs.trim.timeout: 30 > > vfs.zfs.trim.max_interval: 1 > > > > _______________________________________________ > > developer mailing list > > [email protected] > > http://lists.open-zfs.org/mailman/listinfo/developer > > > > > > > _______________________________________________ > > developer mailing list > > [email protected] > > http://lists.open-zfs.org/mailman/listinfo/developer >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
