On Thu, Jun 3, 2010 at 9:41 AM, Roland Scheidegger <[email protected]> wrote: > On 03.06.2010 13:51, Nicolai Haehnle wrote: >> On Thu, Jun 3, 2010 at 7:45 AM, Tom Stellard <[email protected]> wrote: >>> On Wed, Jun 02, 2010 at 11:39:57AM +0200, Nicolai Haehnle wrote: >>>> On Wed, Jun 2, 2010 at 6:53 AM, Tom Stellard <[email protected]> wrote: >>>>> On Tue, Jun 01, 2010 at 12:00:16PM +0200, Nicolai Haehnle wrote: >>>>>> On Tue, Jun 1, 2010 at 7:41 AM, tstellar <[email protected]> wrote: >>>>>>> From: Tom Stellard <[email protected]> >>>>>>> >>>>>>> This fixes bug: >>>>>>> >>>>>>> https://bugs.freedesktop.org/show_bug.cgi?id=25109 >>>>>> Is this really correct? If I understand your patch correctly, what it >>>>>> does is that TEX instructions that depend on earlier TEX instructions >>>>>> will be emitted in the same TEX group on R300. So if you have >>>>>> dependent texture reads like this: >>>>>> >>>>>> MOV r0, something; >>>>>> TEX r1, r0, ...; >>>>>> TEX r2, r1, ...; >>>>>> >>>>>> You will now emit both TEX in the same TEX indirection block, which I >>>>>> thought was wrong, because the second TEX instruction will actually >>>>>> use the contents of r1 *before* the first TEX instruction as >>>>>> coordinates. At least that's how I thought the TEX hardware works: >>>>>> >>>>>> 1) Fetch all coordinates for all TEX instructions in an indirection block >>>>>> 2) Execute all TEX instructions in parallel >>>>>> 3) Store all results in the respective destination registers >>>>>> >>>>>> If my understanding is correct, then I believe your change miscompiles >>>>>> the above shader fragment. Can you clarify this? >>>>>> >>>>> It looks like I am the one who misunderstood how TEX instructions work. >>>>> You are right, the patch does miscompile your example. The shader >>>>> I was having problems with looked like this: >>>>> >>>>> 10: TEX temp[13].xyz, temp[12].xy__, 2D[0]; >>>>> 11: TEX temp[12].xyz, temp[11].xy__, 2D[0]; >>>>> 12: TEX temp[11].xyz, temp[10].xy__, 2D[0]; >>>>> 13: TEX temp[10].xyz, temp[9].xy__, 2D[0]; >>>>> 14: TEX temp[9].xyz, temp[8].xy__, 2D[0]; >>>>> 15: TEX temp[8].xyz, temp[7].xy__, 2D[0]; >>>>> 16: TEX temp[7].xyz, input[4].xy__, 2D[0]; >>>>> 17: TEX temp[6].xyz, temp[6].xy__, 2D[0]; >>>>> 18: TEX temp[5].xyz, temp[5].xy__, 2D[0]; >>>>> 19: TEX temp[4].xyz, temp[4].xy__, 2D[0]; >>>>> 20: TEX temp[3].xyz, temp[3].xy__, 2D[0]; >>>>> >>>>> I think the bug is actually in the pair scheduler's dataflow analysis. >>>>> Currently, the pair scheduler marks #10 as a dependency of #11, which >>>>> would be OK if these were ALU instructions, but it shouldn't be a >>>>> dependency for TEX instructions. >>>> Yes, I think what the dataflow analysis sees is that #10 reads >>>> temp[12] while #11 overwrites temp[12], so its belief is that #10 must >>>> be executed before #11. That's indeed a curious problem. >>>> >>>> So is this done after register allocation? Because it seems to me like >>>> the register dealiasing should rename one of the occurences of >>>> temp[12] in this example... >>>> >>> Currently, the pair scheduler runs before the register allocation pass. I >>> have >>> attached the compiler output for this shader to the original bug if you >>> want to take a look at it: >>> https://bugs.freedesktop.org/show_bug.cgi?id=25109 >> >> Thanks. I think I simply misremembered what's happening there. At some >> point, the compiler had a pass that would dealias register names, so >> that one of the temp[12]s would have been changed to something else, >> but clearly that's no longer the case. >> >> I'm unsure what the best solution is here. Reintroducing the >> dealiasing would certainly work. It would be interesting to know the >> exact semantics of the hardware unit. Is it really: First read all >> registers, then tex lookup, then write all results? > No you can't rely on that. I know because I made that error for the r300 > planar xv support - worked though but was fixed by Alex :-). > (commit 9091b3f5f13dbea83ffd89679dac600e9f280bb2). > >> In other words, is >> there a guarantee that all coordinates will be read before the first >> result is written? In that case, one could separate the "commit" phase >> of texture units: first commit all the coordinate reads in one batch, >> allowing new TEX instructions to become ready. Then finalize the block >> and commit all the writes. >> >> I've never done experiments to figure out what the exact semantics of >> the TEX blocks are, and I haven't seen any documentation that >> clarifies it. Back when I worked on the code I've always gone the safe >> route, i.e. assuming that both parallel and serial execution of TEX >> instruction is possible if the GPU feels like it, which reduced errors >> but unfortunately means that I'm none the wiser to this day. > > I guess it could very well be some mix of parallel and serial execution > in hardware. Maybe (due to hardware pipelining) it is even possible that > using an overwritten reg can still be used as source for, say, the next > 3 tex instructions, but that is highly speculative, might be asic > dependent and not even be true :-). (Interestingly, in the case it won't > work, this would imply you could have more than 4 dependent tex lookups, > as long as the tex result is consumed directly by another tex > instruction without any arithmetic in-between, by bundling them (far > apart?) in the same phase...). In any case, without some more > documentation, playing safe seems to be the only choice.
I'll ask the hw folks today. Alex _______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
