On 12.10.2016 01:49, Tom Stellard wrote:
On Tue, Oct 11, 2016 at 03:21:24PM +0200, Nicolai Hähnle wrote:
On 11.10.2016 07:36, Dave Airlie wrote:
On 11 October 2016 at 12:13, Dave Airlie <airl...@gmail.com> wrote:
On 11 October 2016 at 11:42, Dave Airlie <airl...@gmail.com> wrote:
On 11 October 2016 at 05:50, Dave Airlie <airl...@gmail.com> wrote:
On 10 October 2016 at 21:45, Arsenault, Matthew
<matthew.arsena...@amd.com> wrote:
I don't like adding explicit IR arguments for ABI arguments, especially this
one. Adding a special case for the first index feels dirty. The rest of llvm
also won't be aware of the specialness of the argument. It would be
problematic because bugpoint would eliminate the unused argument and then
codegen would have to fail in some way when the argument is missing

That's a good point, but is there an alternative without burning two
userdata SGPRs?

One possibility is to define an ABI that says:

1. SGPR0/1 points to an extra data region; it is reserved independently
from the shader arguments.
2. The first 64 bits of that extra data region point to the scratch buffer.
3. The main shader code can retrieve SGPR0/1 using an intrinsic.

This can be made to look somewhat similar to what HSA does.

What if we stored all shader inputs in the 'extra data region', with an
ABI that defined fixed offsets in the 'extra data region' for each

Then as an optimization we could have the compiler map the values that
it needed from the 'extra data region' into user sgprs and communicate
this back to the driver.

This gets us something that works very quickly and still allows us to do
optimizations in the future.

That sounds overly complicated to me. I think it's a good thing that the driver can control the layout as much as possible.

We're likely to run into situations where we want to change some fixed aspect of the layout, and then changes need to be coordinated between the driver and LLVM.

Also, I'd mostly expect layout optimizations to be aimed at making state changes cheaper, and the driver is really the piece of code that has the information to do that, not the compiler.



We should just hardcode the behaviour and switch both radv/radeonsi
over in one go?

I'll try and code up, using the first 64-bits of the first buffer
pointed to by userdata 0/1,
to store things.

I've looked at doing a dword fetch from the first two words of the 0/1 userdata,

It's not optimal for vulkan unfortunately, since the idea I had was per command
buffer I just allocate one scratch buffer of the size required at the end, and
patch it in at the start of the command buffer. However in the first
slot I was going
to use the push constants/dynamic buffer to store the value, however it looks
like I need to keep a list of everyone of these buffers I emit, and
backpatch them
all. It might not be too insane, just a slight bump in the keeping it simple.

I'm probably losing te plot here, but I'm considering a double indirection,

we load the 64-bit address from the first two dwords, then load the
64-bits dword
from that address to get the value.

This saves me allocating scratch bo's for secondary command buffers,
and also having to allocating ever increasing scratch bo's as shaders that
need more scratch get bound to the pipeline.
I'm not sure how much of an effect this should have for GL though.

I've posted a patch to this affect to the llvm phabricator.

It definitely is cleaner for the radv driver.

I still think it would be nice to have the level of indirection or
whatever one wants to call it as a function attribute. This would allow
you to change your mind about e.g. just sticking the scratch pointer
directly into SGPR0/1. radeonsi and radv don't have to be identical in
that regard.

mesa-dev mailing list
mesa-dev mailing list

Reply via email to