Hello: I am looking at a possible mem corruption issue, and am trying to make sense out of the kmem log info. This is a non-debug kernel with kmem_flags set to 0xf (str_ftnever is 1).
The problem seems to be that freemsg() tries to free an already freed dblk. The mp (in the core) has a valid b_datap (dblk), while the registers holding the dblk info. from the mp, have 0xdeadbeefdeadbeef. So, I am not sure if somehow the mblk has been reallocated to get a new dblk. > $c freemsg+0x79() mir_close+0x48() rmm_close+0x11() qdetach+0x82() strclose+0x3e4() device_close+0xf0() spec_close+0x178() fop_close+0x2c() closef+0x62() closeandsetf+0x246() close+0xb() sys_syscall32+0x101() > $r %rax = 0x0000000000000001 %r9 = 0x000000000000000f %rbx = 0xdeadbeefdeadbeef %r10 = 0x0000000000000000 %rcx = 0xffffffff869f6800 %r11 = 0x0000000000000007 %rdx = 0xfffffe80e08e0800 %r12 = 0xdeadbeefdeadbeef %rsi = 0xdeadbeefdeadbeef %r13 = 0xfffffe8635459558 %rdi = 0xfffffe80e32a25c0 %r14 = 0xdeadbeefdeadbeef %r8 = 0xfffffe8000b1ef10 %r15 = 0x0000000000000000 %rip = 0xfffffffffb9e62e9 freemsg+0x79 %rbp = 0xfffffe8000b1ec50 %rsp = 0xfffffe8000b1ec30 %rflags = 0x00010246 id=0 vip=0 vif=0 ac=0 vm=0 rf=1 nt=0 iopl=0x0 status=<of,df,IF,tf,sf,ZF,af,PF,cf> freemsg+0xb: movq %rdi,%rbx [...] freemsg+0x28: movq 0x28(%rbx),%r12 (b_datap) freemsg+0x2c: movq 0x10(%rbx),%r14 (b_cont) and freemsg+0x79 is: call *0x30(%r12) (call to db_free) given: freemsg+0x70: movq %rbx,%rdi freemsg+0x73: movq %r14,%rbx freemsg+0x76: movq %r12,%rsi I am assuming that %rdi is the mblk in question. > 0xfffffe80e32a25c0::print mblk_t b_datap b_cont b_datap = 0xffffffffa9806600 b_cont = 0xffffffff8733ca80 > 0xffffffffa9806600::whatis ffffffffa9806600 is ffffffffa9806600+0, bufctl ffffffffa981bb18 allocated from streams_dblk_80 I am not sure if %rcx could be the old b_datap, since using its bufctl: > ffffffff8697b7d8::bufctl -v ADDR BUFADDR TIMESTAMP THREAD CACHE LASTLOG CONTENTS ffffffff8697b7d8 ffffffff869f6800 64361491591 fffffe80e08e08e0 ffffffff80067008 ffffffff814fd2c0 ffffffff81e60e20 kmem_cache_free_debug+0xf4 kmem_cache_free+0x43 dblk_lastfree+0x3d freemsg+0x7e mir_close+0x48 rmm_close+0x11 qdetach+0x82 strclose+0x3e4 device_close+0xf0 spec_close+0x178 fop_close+0x2c closef+0x62 closeandsetf+0x246 close+0xb In looking at the log for the new dblk - 0xffffffffa9806600 - I see the following entries (using ::walk kmem_log | ::bufctl -va <0xffffffffa9806600's bufctl>) [...] ffffffff807f2d40 ffffffffa9806600 642a445e35e fffffe8123a693a0 ffffffff80065008 ffffffff807e4f40 ffffffff82949d60 kmem_cache_free_debug+0xf4 kmem_cache_free+0x43 dblk_lastfree+0x3d freemsg+0x7e tcp_rput_data+0x16ff squeue_drain+0xf0 squeue_enter+0xb1 tcp_wput+0x86 putnext+0x1f1 mir_wput+0x153 rmm_wput+0x11 put+0x1b0 svc_cots_ksend+0x8d svc_sendreply+0x48 common_dispatch+0x3b9 ffffffff807e47c0 ffffffffa9806600 642a42605d8 fffffe8123a693a0 ffffffff80065008 ffffffff807c6400 ffffffff82944e10 kmem_cache_free_debug+0xf4 kmem_cache_free+0x43 dblk_lastfree+0x3d freemsg+0x7e tcp_rput_data+0x16ff squeue_drain+0xf0 squeue_enter+0xb1 tcp_wput+0x86 putnext+0x1f1 mir_wput+0x153 rmm_wput+0x11 put+0x1b0 svc_cots_ksend+0x8d svc_sendreply+0x48 common_dispatch+0x3b9 ffffffff807c6400 ffffffffa9806600 642a3d90aca fffffe80000b3c80 ffffffff80065008 ffffffff807b7040 ffffffff82935908 kmem_cache_alloc_debug+0x1fa kmem_cache_alloc+0x6f allocb+0x77 nge_recv_packet+0x9a nge_recv_ring+0x161 nge_receive+0x31 nge_intr_handle+0x13d nge_chip_intr+0x7c av_dispatch_autovect+0x78 [...] (there are quite a few transactions before and after these) a ::kmem_log (for ffffffffa9806600) has the following (strictly in sequence) [...] ffffffff807f2d40 ffffffffa9806600 642a445e35e fffffe8123a693a0 ffffffff807e47c0 ffffffffa9806600 642a42605d8 fffffe8123a693a0 ffffffff807c6400 ffffffffa9806600 642a3d90aca fffffe80000b3c80 [...] What I am wondering is if the log above is complete, i.e. if there are 2 frees for the dblk or if the logging info. is missing an alloc in between. I suppose it is the latter? since the second free would have otherwise panic'd the system (even a non-debug). Another reason to suspect that I am missing some log records is when I look at the result of "::walk kmem_log | ::bufctl -va <0xfffffe80e32a25c0's bufctl>" which has: [...] ffffffff80f94f40 fffffe80e32a25c0 64308cdc495 fffffe88eccdec00 ffffffff80063008 ffffffff80f86000 ffffffff81c7a300 kmem_cache_free_debug+0xf4 kmem_cache_free+0x43 dblk_destructor+0x25 kmem_cache_free_debug+0x1e7 kmem_cache_free+0x43 dblk_lastfree+0x3d freemsg+0x7e tcp_rput_data+0x16ff squeue_drain+0xf0 squeue_enter+0xb1 tcp_wput+0x86 putnext+0x1f1 mir_wput+0x153 rmm_wput+0x11 put+0x1b0 ffffffff80f86000 fffffe80e32a25c0 64308ad8cf9 fffffe80000bfc80 ffffffff80063008 ffffffff80f85f40 ffffffff81c74da0 kmem_cache_alloc_debug+0x1fa kmem_cache_alloc+0x6f dblk_constructor+0x36 kmem_cache_alloc_debug+0x276 kmem_cache_alloc+0x6f allocb+0x77 nge_recv_packet+0x9a nge_recv_ring+0x161 nge_receive+0x31 nge_intr_handle+0x13d nge_chip_intr+0x7c av_dispatch_autovect+0x78 ffffffff80f85b80 fffffe80e32a25c0 64308ac93b4 fffffe88eccdec00 ffffffff80063008 ffffffff80f85ac0 ffffffff81c74c60 kmem_cache_alloc_debug+0x1fa kmem_cache_alloc+0x6f dblk_constructor+0x36 kmem_cache_alloc_debug+0x276 kmem_cache_alloc+0x6f allocb+0x77 allocb_tmpl+0x11 copyb+0x72 copymsg+0x21 copymsgchain+0x28 i_dls_link_ether_loopback+0xd9 mac_txloop+0x91 dls_tx+0xe str_mdata_fastpath_put+0x1f tcp_send_data+0x650 [...] (there are quite a few transactions before and after these). I can't see how there could be two alloc's for the mblk without a free somewhere in between. a ::kmem_log has the following for fffffe80e32a25c0 (strictly in sequence): [...] ffffffff80f94f40 fffffe80e32a25c0 64308cdc495 fffffe88eccdec00 ffffffff80f86000 fffffe80e32a25c0 64308ad8cf9 fffffe80000bfc80 ffffffff80f85b80 fffffe80e32a25c0 64308ac93b4 fffffe88eccdec00 [...] There are no alloc failures in the core. kmem_verify also states everything is clean. Basically, I am not sure how complete/reliable the logging info. is and if I can conclude that there is a problem just by looking at the logs above (I guess I can't, but want to confirm). If more info. is needed, I can include. thanks, in advance, -venu