Re: [Mesa-dev] [PATCH 16/20] radeonsi: add FMASK texture binding slots and resource setup

Christian König Fri, 09 Aug 2013 01:35:13 -0700

Am 08.08.2013 21:38, schrieb Alex Deucher:

On Thu, Aug 8, 2013 at 1:34 PM, Marek Olšák <mar...@gmail.com> wrote:

On Thu, Aug 8, 2013 at 6:57 PM, Christian König <deathsim...@vodafone.de> wrote:

Am 08.08.2013 16:33, schrieb Marek Olšák:

On Thu, Aug 8, 2013 at 3:09 PM, Christian König <deathsim...@vodafone.de>
wrote:

Am 08.08.2013 14:38, schrieb Marek Olšák:

.On Thu, Aug 8, 2013 at 9:47 AM, Christian König
<deathsim...@vodafone.de> wrote:

Am 08.08.2013 02:20, schrieb Marek Olšák:

FMASK is bound as a separate texture. For every texture, there can be
an FMASK. Therefore a separate array of resource slots has to be
added.

This adds a new mechanism for emitting resource descriptors, its
features
are:
- resource descriptors are stored in an ordinary buffer (not in a CS)


Having resource descriptors outside of the CS has two problems that we
need
to solve first:

1. Fine grained descriptor updates doesn't work, I already tried that.
The
problem is that unlike previous asics descriptors are now a memory
block,
so
no longer part of the CP context. So when we (for example) have a draw
command executing and the next draw command is using new resources for
a
specific slot we would either block until the first draw command is
finished
(which is bad for performance) or change the descriptors while they are
still in use (which results in VM faults).

So what would the proper solution be here? Do I need to flush some
caches or would moving the descriptor updates to the constant IB fix
that?


Actually the current implementation worked better than anything else I
tried.

When you really need the resource descriptors in a separate buffer you
need
to use one buffer for each draw call and always write the full buffer
contents (no partial updates). Flushing anything won't really help
either..

The only solution I see using one buffer is to block until the last draw
call is finished with WAIT_REG_MEM, but that would be quite disastrous
for
performance.

2. If my understand is correct when they are embedded the descriptors
are
preloaded into the caches while executing the IB, so to archive the
same
speed with descriptors outside of the IB you need to add additional
commands
to the constant IB which is new to SI and we currently doesn't support
in
the CS interface.

There seems to be support for the constant IB. The CS ioctl chunk ID
is RADEON_CHUNK_ID_CONST_IB and the allowed packets are listed in
si_vm_packet3_ce_check. Is there anything missing?


The userspace side seems to be missing and except for throwing NOP
packets
into it we never tested it. I know from the closed source side that it
actually was quite tricky for them to get working.

Additional to that please note that I'm not 100% sure that just putting
the
descriptors into the IB is really helping here. It was just the most
simplest solution to avoid allocating a new buffer on each draw call.

I understand. I don't really need to have resource descriptors in a
separate buffer, all I need is these 3 basic features a gallium driver
should support:
- fine-grained resource updates (mainly for performance, see below)
- ability to unbind resources (e.g. by setting IMG_RSRC_WORD1 to 0)
- no GPU crash if a shader is using SAMPLER[15] but there are no samplers
bound

FYI, partial sampler view and sampler state updates are coming to
gallium, Brian Paul already has some patches, it's just a matter of
time now. Vertex and constant buffer states already support partial
updates.


That shouldn't be to much off a problem.

Just allocate a state at startup and initialize it with the proper pm4
commands for 16 samplers, then update the resource descriptors in that state
when we change the bound textures/samplers/views/constants/whatever. All we
need to do then is setting the emitted state to NULL so that it gets
re-emitted in the next draw command.

That would re-emit all 16 shader resources even if just one of them
needs to be changed. I was trying to avoid this inefficiency. Is it
really impossible to emit just one resource descriptor and keep the
others unchanged? This is a basic D3D10/11 feature, for example:

void ID3D11DeviceContext::VSSetShaderResources(
   [in]  UINT StartSlot,
   [in]  UINT NumViews,
   [in]  ID3D11ShaderResourceView *const *ppShaderResourceViews
);

If the constant engine is required to implement this interface
efficiently, then I'd like to work on constant IB support.

You'll need to either store them in memory or re-emit them if you
store them in the IB.  The CE is mainly there so that it can prime the
TC in parallel with the command stream processing.

Yeah indeed. The CE is just for prefetching everything into caches anddoesn't really help here.

The only two options I see is either fully emitting it into the commandstream whenever anything changes or allocating a new buffer for theresources on each new draw call, copying over the old state and thensetting just the things that changed. Both options have their pro andcons, no idea what might be better.

Fact is the resource descriptors are not allowed to change as long asthe shaders are running.


Christian.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 16/20] radeonsi: add FMASK texture binding slots and resource setup

Reply via email to