On Mon, Jun 16, 2014 at 12:42 PM, Lakhinder Walia <[email protected]> wrote:
> Hi. > > I have a FreeBSD 10.0-STABLE #0 r264414 based system with 32 GB RAM > running ZFS/Geli. > > I would like to understand if the response pattern to these steps show a > bug. > > Steps (details below): > > A. Boot the system. Create a large file, 5x the RAM, using dd. > > B. Run a 'find' on a zfs directory (tree) with approx 32 Million files. > > C. Run 'find' command _back-to-back_ for a total of 4 times. > > 'find' response time goes down from: > 32 min -> 17 min -> 1+ min -> 17 seconds. > (No other significant zfs i/o is going on at this time). > > Question: > > Q1. Why are the second and third 'find' back-to-back steps taking so long? > MFU should have (AFAIK) held on to the metadata after the first 'find' > operation was over. Shouldn't the subsequent 'find' have executed faster!? > (There is no other i/o intensive or job running. No NFS clients are > present in the tests, even though NFS sharing is configured.) > Have you looked at how much I/O and CPU are used in each of the 4 runs? This would help identify if the problem is that we are not caching the data in the ARC, vs. if some other data structure is being torn down and re-created (e.g. vnode_t, znode_t, dnode_t). I would guess that the 17 minute run is I/O bound, but the 1-minute run might be CPU-bound. If it is i/o bound, the next step would be to determine why the data is not cached after the first run. You might start by looking at the arc stats before and after each run. If it is CPU bound, a flame graph might quickly show the problem area. http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html It's possible that after the first run we are not using enough of the ARC to cache metadata (despite the fact that you've raised the metadata limit large enough). I recently did some work to address this issue, which is not yet integrated to illumos or FreeBSD. You can see it here: http://reviews.csiden.org/r/50 > > Q2. I have another issue, though not detailed here. A ruby script > doing a traversal of all the files via its 'Find' api gets to be very > very slow, particularly after a new 'find' job kicks in, in-parallel > due to some default system security policy: > /etc/periodic/security/110.neggrpperm. When I kill the second 'find', > the ruby script 'Find' based call continues _still_ at its very slow > pace, for hours. A script that is just listing files could have > finished in two hours instead takes approx 10 hours even if that is the > only file i/o going on. > Are both 'find' commands looking at the same directory? If not, the new 'find' command could be displacing some of the cache entries needed by the first command. Steps to analyze this are similar to what I mentioned above -- determine if is CPU or i/o bound, then what is causing that. > Q3. Could the two issues above possibly related? > I'm not sure. --matt > > Thanks, > > =Lakhinder > > > > Steps: ==============================================> > > # Step A. Freshly booted system. > root@fbsd_10:/zdata_1/xx # > $ dd if=/dev/zero of=file_2 bs=16M count=10240; date > 10240+0 records in > 10240+0 records out > 171798691840 bytes transferred in 192.188421 secs (893907609 bytes/sec) > > # Step B > # cd over to another directory with approx 32 Million files > # in its various subdirectories. > root@fbsd_10:/zdata_1/sr_2gig # date; find . | wc; date > Sun Jun 15 15:52:32 PDT 2014 > 32673014 32673014 2378434006 > Sun Jun 15 16:24:22 PDT 2014 > # time taken is approx. 32 minutes. > > # Step C-1. > root@fbsd_10:/zdata_1/sr_2gig # date; find . | wc; date > Sun Jun 15 16:24:22 PDT 2014 > 32673014 32673014 2378434006 > Sun Jun 15 16:41:56 PDT 2014 > # time taken is approx. 17 minutes. > > # Step C-2. > root@fbsd_10:/zdata_1/sr_2gig # date; find . | wc; date > Sun Jun 15 16:41:56 PDT 2014 > 32673014 32673014 2378434006 > Sun Jun 15 16:43:07 PDT 2014 > # time taken is approx 1+ minute. > > # Step C-3. > root@fbsd_10:/zdata_1/sr_2gig # date; find . | wc; date > Sun Jun 15 16:43:07 PDT 2014 > 32673014 32673014 2378434006 > Sun Jun 15 16:43:26 PDT 2014 > #time taken is 19 seconds. > > --------------- system details -------------------------------------- > > OS version: > FreeBSD fbsd_10 10.0-STABLE FreeBSD 10.0-STABLE #0 r264414: > Sun Apr 13 19:33:56 UTC 2014 > [email protected]:/usr/obj/usr/src/sys/GENERIC amd64 > > cat /boot/loader.conf: > aesni_load="YES" > geli_load="YES" > zfs_load="YES" > vfs.zfs.arc_meta_limit=17179869184 > # 16 GB limit for ARC Meta. > kern.maxvnodes=50000000 > > > Here is 'top' after running a _recent_ 'find'. > > last pid: 75231; load averages: 0.48, 0.28, 0.33 up > 0+20:20:26 11:58:50 > 37 processes: 1 running, 36 sleeping > CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.9% idle > Mem: 28M Active, 796K Inact, 30G Wired, 2620K Cache, 97M Buf, 911M Free > ARC: 24G Total, 4756M MFU, 18G MRU, 16K Anon, 385M Header, 1308M Other > Swap: 1638M Total, 41M Used, 1597M Free, 2% Inuse > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU > COMMAND > 2486 root 1 21 0 34980K 2236K nanslp 6 1:46 0.00% > perl5.16.3 > 2478 root 1 20 0 56300K 15720K select 3 1:14 0.00% tmux > 2483 root 1 20 0 19824K 1612K CPU5 5 0:47 0.00% top > **** > > # after the _recent_ refresh with 'find': > root@fbsd_10:/ # sysctl vfs.zfs: > > vfs.zfs.arc_max: 32373334016 > vfs.zfs.arc_min: 8589934592 > vfs.zfs.arc_meta_used: 17180938800 > vfs.zfs.arc_meta_limit: 17179869184 > vfs.zfs.l2arc_write_max: 8388608 > vfs.zfs.l2arc_write_boost: 8388608 > vfs.zfs.l2arc_headroom: 2 > vfs.zfs.l2arc_feed_secs: 1 > vfs.zfs..l2arc_feed_min_ms: 200 > vfs.zfs.l2arc_noprefetch: 1 > vfs.zfs.l2arc_feed_again: 1 > vfs.zfs.l2arc_norw: 1 > vfs.zfs.anon_size: 16384 > vfs.zfs.anon_metadata_lsize: 0 > vfs.zfs.anon_data_lsize: 0 > vfs.zfs.mru_size: 18898130944 > vfs.zfs.mru_metadata_lsize: 10115874816 > vfs.zfs.mru_data_lsize: 8479965184 > vfs.zfs.mru_ghost_size: 6764165120 > vfs.zfs.mru_ghost_metadata_lsize: 6764165120 > vfs.zfs.mru_ghost_data_lsize: 0 > vfs.zfs.mfu_size: 4986851840 > vfs.zfs.mfu_metadata_lsize: 4631966208 > vfs.zfs.mfu_data_lsize: 0 > vfs.zfs.mfu_ghost_size: 2492798976 > vfs.zfs.mfu_ghost_metadata_lsize: 2492798976 > vfs.zfs.mfu_ghost_data_lsize: 0 > vfs.zfs.l2c_only_size: 0 > vfs.zfs.dedup.prefetch: 1 > vfs.zfs.nopwrite_enabled: 1 > vfs.zfs.mdcomp_disable: 0 > vfs.zfs.prefetch_disable: 0 > vfs.zfs.zfetch.max_streams: 8 > vfs.zfs.zfetch.min_sec_reap: 2 > vfs.zfs.zfetch.block_cap: 256 > vfs.zfs.zfetch.array_rd_sz: 1048576 > vfs.zfs.top_maxinflight: 32 > vfs.zfs.resilver_delay: 2 > vfs.zfs.scrub_delay: 4 > vfs.zfs.scan_idle: 50 > vfs.zfs.scan_min_time_ms: 1000 > vfs.zfs.free_min_time_ms: 1000 > vfs.zfs.resilver_min_time_ms: 3000 > vfs.zfs.no_scrub_io: 0 > vfs.zfs.no_scrub_prefetch: 0 > vfs.zfs.metaslab.gang_bang: 131073 > vfs.zfs.metaslab.debug_load: 0 > vfs.zfs.metaslab.debug_unload: 0 > vfs.zfs.metaslab.df_alloc_threshold: 131072 > vfs.zfs.metaslab.df_free_pct: 4 > vfs.zfs.metaslab.min_alloc_size: 10485760 > vfs.zfs.metaslab.load_pct: 50 > vfs.zfs.metaslab.unload_delay: 8 > vfs.zfs.metaslab.preload_limit: 3 > vfs.zfs..metaslab.preload_enabled: 1 > vfs.zfs.metaslab.weight_factor_enable: 0 > vfs.zfs.condense_pct: 200 > vfs.zfs.mg_alloc_failures: 12 > vfs.zfs.mg_noalloc_threshold: 0 > vfs.zfs.write_to_degraded: 0 > vfs.zfs.check_hostid: 1 > vfs.zfs.recover: 0 > vfs.zfs.deadman_synctime_ms: 1000000 > vfs.zfs.deadman_checktime_ms: 5000 > vfs.zfs.deadman_enabled: 0 > vfs.zfs.spa_asize_inflation: 24 > vfs.zfs.txg.timeout: 5 > vfs.zfs.vdev.cache.max: 16384 > vfs.zfs.vdev.cache.size: 0 > vfs.zfs.vdev.cache.bshift: 16 > vfs.zfs.vdev.trim_on_init: 1 > vfs.zfs.vdev.max_active: 1000 > vfs.zfs.vdev.sync_read_min_active: 10 > vfs.zfs.vdev.sync_read_max_active: 10 > vfs.zfs.vdev.sync_write_min_active: 10 > vfs.zfs.vdev.sync_write_max_active: 10 > vfs.zfs.vdev.async_read_min_active: 1 > vfs.zfs.vdev.async_read_max_active: 3 > vfs.zfs.vdev.async_write_min_active: 1 > vfs.zfs.vdev.async_write_max_active: 10 > vfs.zfs.vdev.scrub_min_active: 1 > vfs.zfs.vdev.scrub_max_active: 2 > vfs.zfs.vdev.aggregation_limit: 131072 > vfs.zfs.vdev.read_gap_limit: 32768 > vfs.zfs.vdev.write_gap_limit: 4096 > vfs.zfs.vdev.bio_flush_disable: 0 > vfs.zfs.vdev.bio_delete_disable: 0 > vfs.zfs.vdev.trim_max_bytes: 2147483648 > vfs.zfs.vdev.trim_max_pending: 64 > vfs.zfs.max_auto_ashift: 13 > vfs.zfs.zil_replay_disable: 0 > vfs.zfs.cache_flush_disable: 0 > vfs.zfs.zio.use_uma: 1 > vfs.zfs.zio.exclude_metadata: 0 > vfs.zfs.sync_pass_deferred_free: 2 > vfs.zfs.sync_pass_dont_compress: 5 > vfs..zfs.sync_pass_rewrite: 2 > vfs.zfs.snapshot_list_prefetch: 0 > vfs.zfs.super_owner: 0 > vfs.zfs.debug: 0 > vfs.zfs.version.ioctl: 3 > vfs.zfs.version.acl: 1 > vfs.zfs.version.spa: 5000 > vfs.zfs.version.zpl: 5 > vfs.zfs.trim.enabled: 1 > vfs.zfs.trim.txg_delay: 32 > vfs.zfs.trim.timeout: 30 > vfs.zfs.trim.max_interval: 1 > > _______________________________________________ > developer mailing list > [email protected] > http://lists.open-zfs.org/mailman/listinfo/developer > >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
