On Wed, Apr 30, 2014 at 11:54 AM, Andy Ritger <[email protected]> wrote: > Sorry for the very slow response to this, Ilia. > > For the specific error you mentioned: the error code > 0x51 is "ErrorSrcLineExceedsPitch", and error code 0x53 is > "ErrorDstLineExceedsPitch". It looks like class 0x9039 will generate > those errors under the following conditions: > > if ((NV9039_LAUNCH_DMA_SRC_MEMORY_LAYOUT == PITCH) && > (NV9039_LAUNCH_DMA_SRC_INLINE == FALSE) && > (NV9039_LINE_COUNT_VALUE > 1) && > (NV9039_PITCH_IN_VALUE >= 0) && > (NV9039_LINE_LENGTH_IN_VALUE > NV9039_PITCH_IN_VALUE)) { > return ErrorSrcLineExceedsPitch; > } > > if ((NV9039_LAUNCH_DMA_DST_MEMORY_LAYOUT == PITCH) && > (NV9039_LINE_COUNT_VALUE > 1) && > (NV9039_PITCH_OUT_VALUE >= 0) && > (NV9039_LINE_LENGTH_IN_VALUE > NV9039_PITCH_OUT_VALUE)) { > return ErrorDstLineExceedsPitch; > } > > Where those NV9039_* method values are defined as: > > #define NV9039_LAUNCH_DMA > 0x0300 > #define NV9039_LAUNCH_DMA_SRC_INLINE > 0:0 > #define NV9039_LAUNCH_DMA_SRC_INLINE_FALSE > 0x00000000 > #define NV9039_LAUNCH_DMA_SRC_INLINE_TRUE > 0x00000001 > #define NV9039_LAUNCH_DMA_SRC_MEMORY_LAYOUT > 4:4 > #define NV9039_LAUNCH_DMA_SRC_MEMORY_LAYOUT_BLOCKLINEAR > 0x00000000 > #define NV9039_LAUNCH_DMA_SRC_MEMORY_LAYOUT_PITCH > 0x00000001 > #define NV9039_LAUNCH_DMA_DST_MEMORY_LAYOUT > 8:8 > #define NV9039_LAUNCH_DMA_DST_MEMORY_LAYOUT_BLOCKLINEAR > 0x00000000 > #define NV9039_LAUNCH_DMA_DST_MEMORY_LAYOUT_PITCH > 0x00000001 > > #define NV9039_PITCH_IN > 0x0314 > #define NV9039_PITCH_IN_VALUE > 31:0 > > #define NV9039_PITCH_OUT > 0x0318 > #define NV9039_PITCH_OUT_VALUE > 31:0 > > #define NV9039_LINE_LENGTH_IN > 0x031c > #define NV9039_LINE_LENGTH_IN_VALUE > 31:0 > > #define NV9039_LINE_COUNT > 0x0320 > #define NV9039_LINE_COUNT_VALUE > 31:0
Very helpful info, thanks! That should help narrow the source of the problem. > > As far as I can tell, these checks are not GF106-specific, so I'm not > sure why the problem is only showing up there. Maybe there is something > else unique about the GF106 user's configuration that causes this to > be triggered? Perhaps. I've also observed that different GPU's are differently sensitive to invalid values. For example we had a bug that manifested itself in G80-G94 yelling at us about out-of-bounds X/Y coordinates, while G96+ happily took the illegal values (and probably did nasty things with them like overwriting memory it wasn't supposed to touch). It is odd that _only_ GF106 would have that logic, but... whatever. I'm also missing GF104, GF110, GF117 results, so who knows, perhaps they would have also reported the issue. I guess another possibility I hadn't previously considered is that this user's GF106 could just be somehow busted, his is the only one I know of, so I couldn't cross-check with a different one. But the problem is sufficiently restricted that it seems unlikely to be a bad part, and more likely a driver bug. Anyways, now that we know what to look for, it should be much easier to identify in a command stream dump. Thanks again, -ilia > > Thanks, > - Andy > > > On Tue, Mar 18, 2014 at 06:44:30AM -0700, Ilia Mirkin wrote: >> Hello, >> >> A user on an NVC3 card (GF106) is running into data errors on m2mf >> (class 0x9039) that we haven't seen before: >> >> http://people.freedesktop.org/~imirkin/nvc0-comparison/nvc3-2014-03-17-agashlin/glean/fbo.html >> http://people.freedesktop.org/~imirkin/nvc0-comparison/nvc3-2014-03-17-agashlin/spec/!OpenGL%201.1/copyteximage%201D.html >> >> Specifically the data errors 0x51 and 0x53, when running method 0x300 >> ("EXEC"). Any chance you could let us know what those errors are? (Or, >> even better, provide the full table so that we'll have a better idea >> in future cases as well.) >> >> Here are a few that we know about, so you know exactly what table I'm >> talking about (our full list at >> https://github.com/envytools/envytools/blob/master/rnndb/nv50_defs.xml#L192): >> >> 0x04: INVALID_VALUE >> 0x05: INVALID_ENUM >> 0x08: INVALID_OBJECT >> 0x0c: INVALID_BITFIELD >> 0x3f: PRIMITIVE_ID_NEEDS_GP >> >> We read this data error value from mmio reg 0x400110. >> >> Furthermore, if you could provide any insight as to why we would see >> those errors on GF106 but not any other Fermi/Kepler that we've tested >> (which should all run exactly the same code paths), that would be >> extremely helpful as well. You can see the Fermi piglit runs we have >> on file at >> http://people.freedesktop.org/~imirkin/nvc0-comparison/problems.html >> >> Thanks, >> >> -ilia _______________________________________________ Nouveau mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/nouveau
