On 04/15/2014 12:18 PM, Eric Anholt wrote: > Kenneth Graunke <kenn...@whitecape.org> writes: > >> On 04/14/2014 05:33 PM, Eric Anholt wrote: >>> This manifested as rendering failures or sometimes GPU hangs in >>> compositors when they accidentally got MSAA visuals due to a bug in the X >>> Server. Today we decided that the problem in compositors was equivalent >>> to a corruption bug we'd noticed recently in resizing MSAA-visual >>> glxgears, and debugging got a lot easier. >>> >>> When we allocate our MCS MT, libdrm takes the size we request, aligns it >>> to Y tile size (blowing it up from 300x300=900000 bytes to 384*320=122880 >>> bytes, 30 pages), then puts it into a power-of-two-sized BO (131072 bytes, >>> 32 pages). Because it's Y tiled, we attach a 384-byte-stride fence to it. >>> When we memset by the BO size in Mesa, between bytes 122880 and 131072 the >>> data gets stored to the first 20 or so scanlines of each of the 3 tiled >>> pages in that row, even though only 2 of those pages were allocated by >>> libdrm. >> >> What? >> >> I get that drm_intel_bo_alloc/drm_intel_bo_alloc_tiled might return a >> drm_intel_bo where bo->size is larger than what you asked for, due to >> the BO cache. But...what you're saying is, it doesn't actually allocate >> enough pages to back the whole bo->size it gives you? So, if you write >> bytes 0..(bo->size - 1), you'll randomly clobber memory in a way that's >> really difficult to detect? > > You have that many pages, really. But you've attached a fence to it, so > your allocated pages are structured as: > > +---+---+---+ > | | | | > +---+---+---+ > | | | | > +---+---+---+ > | | | | > +---+---+---+ > | | | > +---+---+ > > (except taller in this specific example). If you hit the pixels in > those quads, you'll be fine. > >> >> There are other places where we memset an entire BO using bo->size. For >> example, your INTEL_DEBUG=shader_time code does exactly that (though it >> isn't tiled). >> >> Could we change libdrm to set bo->size to the actual usable size of the >> buffer, rather than the bucket size? > > The pages containing pixels you asked for go to 122880, and the BO is > 131072, but the pixels you asked for have a maximum linear address of > 384*320=115200. Which value are you thinking is the "actual usable > size"? We certainly shouldn't have been memsetting more pixels than > 115200.
115200, I guess - the maximum linear address. Because if I map it through a fence, and try to access beyond that, the tiling may put it beyond the page containing pixels I asked for, which is dangerous. Maybe it doesn't need to change---really, I'm wondering how to prevent this kind of problem in the future, since it seems to be an easy mistake to make, and I know a bunch of us read that code many times and didn't spot the problem. Perhaps there's a way to teach valgrind about the maximum linear address when mapping through a fence, so it can complain about accesses beyond that? While that wouldn't prevent us from writing bad code, it would certainly help detect it. --Ken
signature.asc
Description: OpenPGP digital signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev