Hi Everyone, I have a couple of systems running opensolaris b118, one of which sends hourly snapshots to the other. This has been working well, however as of today, the receiving zfs process has started running extremely slowly, and is running at 100% CPU on one core, completely in kernel mode. A little bit of exploration with lockstat and dtrace seems to imply that the issue is around the "dbuf_free_range" function - or at least, that's what it looks like to my inexperienced eye! The system is very unresponsive while this problem is occurring, with frequent multi-second delays between my typing into an ssh session and getting a response. Has anyone seen this before, or know if there's anything I can do about it? Lockstat says: Adaptive mutex hold: 52299 events in 1.694 seconds (30873 events/sec) ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 50 87% 87% 0.00 23410134 0xffffff0273d6dcb8 dbuf_free_range+0x268 nsec ------ Time Distribution ------ count 33554432 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 50 ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 53 0% 87% 0.00 117124 0xffffff025f404140 txg_rele_to_quiesce+0x18 nsec ------ Time Distribution ------ count 131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 53 ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 2536 0% 88% 0.00 2194 0xffffff024f09bd40 kmem_cache_alloc+0x84 nsec ------ Time Distribution ------ count 2048 |@@@@@@@@@@@@@ 1121 4096 |@@@@@@@@@@@@@@@@ 1383 8192 | 0 16384 | 32 ------------------------------------------------------------------------------- Count indv cuml rcnt nsec Lock Caller 415 0% 88% 0.00 7968 0xffffff025120ce00 anon_resvmem+0xb4 [snip] I modified the "procsystime" script from the dtrace toolkit to trace fbt:zfs::entry/return rather than syscalls, and I get an output like this (approximately a 5sec sample): Function nsecs l2arc_write_eligible 46155456 spa_last_synced_txg 49211711 zio_inherit_child_errors 50039132 zil_commit 50137305 buf_hash 50839858 dbuf_rele 56388970 zio_wait_for_children 67533094 dbuf_update_data 70302317 dsl_dir_space_towrite 72179496 parent_delta 77685327 dbuf_hash 79075532 free_range_compar 82479196 dbuf_free_range 2715449416 TOTAL: 4156078240 Thanks, Tristan |
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss