On Thu, Sep 6, 2012 at 11:32 AM, Alex Deucher <alexdeuc...@gmail.com> wrote:
> On Thu, Sep 6, 2012 at 10:54 AM, Jerome Glisse <j.gli...@gmail.com> wrote:
>> On Thu, Sep 6, 2012 at 6:20 AM, Dave Airlie <airl...@gmail.com> wrote:
>>> On Thu, Sep 6, 2012 at 5:21 PM, Philipp Klaus Krause <p...@spth.de> wrote:
>>>> On 06.09.2012 07:35, j.gli...@gmail.com wrote:
>>>>> From: Jerome Glisse <jgli...@redhat.com>
>>>>>
>>>>> To avoid GPU lockup registers must be emited in a specific order
>>>>> (no kidding ...). This patch rework atom emission so order in which
>>>>> atom are emited in respect to each other is always the same. We
>>>>> don't have any informations on what is the correct order so order
>>>>> will need to be infered from fglrx command stream.
>>>>
>>>> Shouldn't this be stated in comments, so the next person who comes along
>>>> and makes a change in this code doesn't inadvertently change the order?
>>>
>>> Also a comment on what ordering matters most, like I suspect this is
>>> just hiding a real issue.
>>>
>>> Dave.
>>
>> No it's not hiding an issue, afaict it's how the hw works. The hw do
>> what some amd document call states validations. So here is how i
>> understand how things happen and i can be completely wrong. Hw process
>> register write in order it receive them and to avoid postponing state
>> validation the hw do state validation while processing register. That
>> means if writing register A trigger state validation that use some
>> field of register B the hw might not redo state validation when
>> register B is latter written. ie only some register trigger the state
>> validation no matter on what they depends on. I believe state
>> validation is only use as pipeline optimization by the hw, so the hw
>> knows it can take some short cut. But in some rare case if short cut
>> are taken for wrong reasons we end up in GPU lockup.
>>
>> No matter if my guess is right or wrong, i know for a fact that
>> register order is important in some situation, that's the hard bottom
>> line, no matter what is the reasons inside the hw.
>>
>> This patch is far from having all the order right, it's just a first
>> step, i am atomizing everything and it's what needed to go forward
>> without regression.
>
> I've talked to the internal hw and sw guys and they said there isn't
> any specific ordering required and the closed driver doesn't impose
> any specific order.  The pipeline doesn't get kicked off until a draw
> command is issued, so I don't see why the state update order would
> matter.  It's possible there are subtle ordering requirements and the
> closed driver just happened to get it right.  There are dependencies
> and hw bug workarounds however.  E.g., some blocks snoop registers
> from other blocks so you need to make sure those dependant registers
> have been initialized before drawing.  I don't know if it's the
> ordering so much as making sure we emit all the necessary state when
> needed.  The closed driver tends to update a lot more state the is
> minimally required for a lot of things.  That said, it probably
> wouldn't hurt to mirror the closed driver more closely.
>
> Alex

I don't know what are the reason but what register are emitted and
along which other register definitely matter. All files i am talking
in this mail are located at :
http://people.freedesktop.org/~glisse/registerposition/

So if you apply :
0001-r600g-FORCE-LOCKUP-BY-EMITTING-OR-NOT-REGISTER.patch

and run piglit test like in lockup-longprim.sh you will lockup the GPU
(i only tested on r6xx, r7xx so far).

I double checked through automated tools that no register that was
written by command stream from longprim piglist test are reprogram
properly by the fbo test (if you have my constant buffer size patch i
sent earlier).

The only diff with command stream is one where
R_02881C_PA_CL_VS_OUT_CNTL is emitted with each and the other only
once, when emitted with each draw it lockups.

bad command stream r600g-long-prim-simple-b.txt
good one r600g-long-prim-simple-g.txt
diff r600g-long-prim-simple-d.txt

Given the bad one emit more register some draw command are moved to
the second cs.

Emitting some other register along PA_CL_VS_OUT_CNTL fix the lockup
(don't have short list) but many other register behave the same as
PA_CL_VS_OUT_CNTL. So if order does not matter then register group
definitely does. I really wish that the hw were less picky about how
command stream are supposed to be formated. Anyhow given that we have
no information on what register need to be emitted together, mimicking
fglrx sounds like the way to go.

Cheers,
Jerome
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to