Garrett D'Amore wrote:

 > Is this on SPARC or x86 hardware?
 >
 > It *sounds* sort of like it might be a problem with corruption of DMA.
 > Make sure that when you do m_stop, you've really shut down your hardware
 > including any DMA transfers *before* you yank the DMA mappings out from
 > underneath it.   (I can imagine in particular a DMA region getting
 > reused, and if your device is still accessing that region, then problems
 > could ensue.)

This is x86..

I switched to using a netperf fill file of /usr/dict/words, and that's
making it a bit easier to distinguish between something getting
zeroed, and something getting clobbered with what I'm sending.
Now I'm seeing a random splattering of a page or so of data from
that file to various locations.

The crash was a GPF:
 > $C
ffffff0008181430 impl_acc_hdl_free+0x1e(ffffff01e1275680)
ffffff0008181460 ddi_dma_mem_free+0x2f(ffffff01f695ade8)
ffffff0008181480 myri10ge_dma_free+0x1d()
ffffff00081814c0 myri10ge_unprepare_tx_ring+0x4b()
ffffff00081814f0 myri10ge_teardown_slice+0x3f()
ffffff0008181530 myri10ge_stop_locked+0x6c()
ffffff0008181550 myri10ge_m_stop+0x6b()
ffffff0008181580 mac_stop+0x47(ffffff01d0082bb8)
<...>

The driver was tearing down its pre-mapped DMA transmit
buffers, and tripped over a ddi_dma_acc_handle that was
corrupted with the contents of /usr/dict/words:

 > ffffff01e1275680::print  ddi_acc_hdl_t
{
     ah_vers = 0x6e6f4141
     ah_bus_private = 0x676f6c6f6f7a0a6f
     ah_platform_private = 0x5a0a6d6f6f7a0a79
<....>

Which is the garbage I was sending.  Eg:
0xffffff01e1275680: 
AAone\nzoo\nzoology\nzoom\nZorn\nZoroaster\nZoroastrian\nzounds\nz's\nzu
cch\nzippy\nzircon\nzirconium\nzloty\nzodiac\nzodiacal\nZoe\nZomba\nzombie\nzing\nzigzag\nzigzagging\nzi
lch\nZimmerman\nzinc\nzing\nZion\nZionism\nzip\nzenith\nzero\nzeroes\nzeroth\nzest\nzesty\nzeta\nZeus\nZ
iegler\nzig\nziggzap\nzazen\nzeal\nZealand\nzealot\nzealous\nzebra\nZeiss\nZellerbach\nZen

This continues on through the beginning of that page:
0xffffff01e1275000:      
M\nacetone\nacetylene\nache\nachieve\nAchilles\naching\nachromatic\nacid
<....>

I don't think this is DMA related corruption, because the data is data
I'm sending.  I should only be receiving acks.

As near as I can figure, there are 2 ways I could be screwing this up:

1) copying this data way out of bounds in my tx copy routine
2) somehow freeing the dma handle twice, and having it end up getting
    allocated to a dblk, and copied over top of.

I'm leaning towards #2.

I've tried setting kmem_flags to 0xf and it doesn't seem to
have any change, except it seems to take longer to trigger
the crash..

Any ideas would be welcome.  Is there any way I can map the
dma acc handles read-only, so that if somebody touches them
the system will panic?

Thanks,

Drew
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to