On 2/4/14, 10:15 PM, Andriy Gapon wrote: > on 04/02/2014 19:32 Andriy Gapon said the following: >> I suggested the following diagnostic patch: >> http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.patch >> >> With the patch applied there is the following panic (sorry for poor quality): >> http://oi59.tinypic.com/sfctvc.jpg >> >> I am trying to get more details. > > So, I've just tested the above patch myself and I have also caught a panic: > (kgdb) bt > #0 doadump (textdump=1) at pcpu.h:234 > #1 0xffffffff808ea45a in kern_reboot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:449 > #2 0xffffffff808e9c73 in panic (fmt=<value optimized out>) at > /usr/src/sys/kern/kern_shutdown.c:637 > #3 0xffffffff81d9216d in assfail (a=<value optimized out>, f=<value optimized > out>, l=<value optimized out>) at > /usr/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81 > #4 0xffffffff81ae0385 in arc_release (buf=0xfffffe01420d24c8, > tag=0xfffffe004f4637e0) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:3681 > #5 0xffffffff81aed68a in dbuf_dirty (db=0xfffffe004f4637e0, > tx=0xfffffe03707e6000) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:1181 > #6 0xffffffff81aee68a in dbuf_will_dirty (db=0xfffffe004f4637e0, > tx=0xfffffe03707e6000) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:1419 > #7 0xffffffff81af4cfc in dmu_write (os=<value optimized out>, object=<value > optimized out>, offset=<value optimized out>, size=8, buf=0xffffff9de0a0f328, > tx=0xfffffe03707e6000) > at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:822 > #8 0xffffffff81b3f430 in spa_history_write (spa=0xffffff8cd9acf000, > buf=0xffffff9de0a0f328, len=8, shpp=0xfffffe004e1b4a00, tx=0xfffffe03707e6000) > at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c:169 > #9 0xffffffff81b3fdb1 in spa_history_log_sync (arg=<value optimized out>, > tx=0xfffffe03707e6000) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c:267 > #10 0xffffffff81b3ff4c in log_internal (nvl=0xfffffe067862d700, > operation=0xffffffff81c1777c "snapshot", spa=0xffffff8cd9acf000, > tx=0xfffffe03707e6000, fmt=0xffffffff81c1777b "", adx=<value optimized out>) > at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c:465 > #11 0xffffffff81b401c0 in spa_history_log_internal_ds (ds=0xfffffe0115de7c00, > operation=0xffffffff81c1777c "snapshot", tx=0xfffffe03707e6000, > fmt=0xffffffff81c1777b "") > at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c:513 > #12 0xffffffff81b11b8a in dsl_dataset_snapshot_sync_impl > (ds=0xfffffe0074fe8000, > snapname=0xfffffe012e0157c2 "1391545232610_xxx", tx=0xfffffe03707e6000) > at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:1193 > #13 0xffffffff81b140b3 in dsl_dataset_snapshot_sync (arg=<value optimized > out>, > tx=0xfffffe03707e6000) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:1214 > #14 0xffffffff81b1ace6 in dsl_sync_task_sync (dst=0xffffff9de178e550, > tx=0xfffffe03707e6000) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c:170 > #15 0xffffffff81b19f6b in dsl_pool_sync (dp=0xfffffe002f6ec000, txg=7020757) > at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:597 > #16 0xffffffff81b37dc1 in spa_sync (spa=0xffffff8cd9acf000, txg=7020757) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:6516 > #17 0xffffffff81b46f38 in txg_sync_thread (arg=<value optimized out>) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:518 > #18 0xffffffff808b866a in fork_exit (callout=0xffffffff81b46c00 > <txg_sync_thread>, arg=0xfffffe002f6ec000, frame=0xffffff9de0a0fb00) at > /usr/src/sys/kern/kern_fork.c:990 > #19 0xffffffff80ccef8e in fork_trampoline () at > /usr/src/sys/amd64/amd64/exception.S:606 > #20 0x0000000000000000 in ?? () > (kgdb) fr 4 > #4 0xffffffff81ae0385 in arc_release (buf=0xfffffe01420d24c8, > tag=0xfffffe004f4637e0) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:3681 > 3681 ASSERT(l2hdr->b_compress == ZIO_COMPRESS_OFF || > (kgdb) list > 3676 hdr->b_size, 0); > 3677 /* > 3678 * We are destroying l2hdr, so ensure that > 3679 * its compressed buffer, if any, is not leaked. > 3680 */ > 3681 ASSERT(l2hdr->b_compress == ZIO_COMPRESS_OFF || > 3682 l2hdr->b_tmp_cdata == NULL); > 3683 kmem_free(l2hdr, sizeof (l2arc_buf_hdr_t)); > 3684 arc_space_return(sizeof (l2arc_buf_hdr_t), > ARC_SPACE_L2HDRS); > 3685 ARCSTAT_INCR(arcstat_l2_size, -buf_size); > (kgdb) p *l2hdr > $1 = {b_dev = 0xfffffe002f33e500, b_daddr = 402763776, b_compress = > ZIO_COMPRESS_LZ4, b_asize = 12288, b_tmp_cdata = 0xffffff8cf2c50000} > (kgdb) p *hdr > $2 = {b_dva = {dva_word = {0, 0}}, b_birth = 0, b_cksum0 = 0, b_freeze_lock = > {lock_object = {lo_name = 0xffffffff81c14ad4 "buf->b_freeze_lock", lo_flags = > 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, > b_freeze_cksum = 0x0, b_thawed = 0xfffffe012bd2b160, b_hash_next = 0x0, > b_buf > = 0xfffffe01420d24c8, b_flags = 65632, b_datacnt = 1, b_acb = 0x0, b_cv = > {cv_description = 0xffffffff81c14ac8 "buf->b_cv)", cv_waiters = 0}, > b_type = ARC_BUFC_METADATA, b_size = 131072, b_spa = 1333901952715475381, > b_state = 0xffffffff81c2da00, b_arc_node = {list_next = 0x0, list_prev = 0x0}, > b_arc_access = 0, b_refcnt = {rc_mtx = {lock_object = { > lo_name = 0xffffffff81c18b87 "rcmtx", lo_flags = 16973824, lo_data = > 0, > lo_witness = 0x0}, mtx_lock = 4}, rc_tracked = 0, rc_list = {list_size = 40, > list_offset = 0, list_head = {list_next = 0xfffffe007a267dc0, > list_prev = 0xfffffe007a267dc0}}, rc_removed = {list_size = 40, > list_offset = 0, list_head = {list_next = 0xfffffe007a267de0, list_prev = > 0xfffffe007a267de0}}, rc_count = 1, rc_removed_count = 0}, b_l2hdr = 0x0, > b_l2node = {list_next = 0x0, list_prev = 0x0}} > (kgdb) p/x *hdr > $3 = {b_dva = {dva_word = {0x0, 0x0}}, b_birth = 0x0, b_cksum0 = 0x0, > b_freeze_lock = {lock_object = {lo_name = 0xffffffff81c14ad4, lo_flags = > 0x2710000, lo_data = 0x0, lo_witness = 0x0}, sx_lock = 0x1}, b_freeze_cksum = > 0x0, > b_thawed = 0xfffffe012bd2b160, b_hash_next = 0x0, b_buf = > 0xfffffe01420d24c8, > b_flags = 0x10060, b_datacnt = 0x1, b_acb = 0x0, b_cv = {cv_description = > 0xffffffff81c14ac8, cv_waiters = 0x0}, b_type = 0x1, b_size = 0x20000, > b_spa = 0x1282f8c238ce49b5, b_state = 0xffffffff81c2da00, b_arc_node = > {list_next = 0x0, list_prev = 0x0}, b_arc_access = 0x0, b_refcnt = {rc_mtx = > {lock_object = {lo_name = 0xffffffff81c18b87, lo_flags = 0x1030000, > lo_data = 0x0, lo_witness = 0x0}, mtx_lock = 0x4}, rc_tracked = 0x0, > rc_list = {list_size = 0x28, list_offset = 0x0, list_head = {list_next = > 0xfffffe007a267dc0, list_prev = 0xfffffe007a267dc0}}, rc_removed = { > list_size = 0x28, list_offset = 0x0, list_head = {list_next = > 0xfffffe007a267de0, list_prev = 0xfffffe007a267de0}}, rc_count = 0x1, > rc_removed_count = 0x0}, b_l2hdr = 0x0, b_l2node = {list_next = 0x0, > list_prev = > 0x0}} > (kgdb) p hdr > $4 = (arc_buf_hdr_t *) 0xfffffe007a267cd0 > (kgdb) p buf->b_hdr > $5 = (arc_buf_hdr_t *) 0xfffffe007a267cd0 > (kgdb) info symbol hdr->b_state > ARC_anon in section .bss > > b_flags value translates to ARC_L2CACHE | ARC_L2COMPRESS | ARC_L2_WRITING if I > am not mistaken. b_compress is ZIO_COMPRESS_LZ4. > To me this looks like at least arc_release() can definitely be called > concurrently with l2arc_write_buffers() and both can be acting on the same > buffer. If arc_release() "wins" then l2hdr->b_tmp_cdata is leaked. >
Ah, now I understand what's going on. The application is dirtying a dbuf who's backing ARC buffer is currently in-flight into the L2ARC, so it needs to be discarded. Instead of going through arc_evict, it calls arc_release directly to dispose of the buffer, but it doesn't clear the in-flight compressed data. So the solution would be obvious: if there's a compressed buffer, call l2arc_release_cdata_buf to clear it. -- Saso _______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
