On Fri, May 20, 2016 at 3:35 AM, Jose Fonseca <jfons...@vmware.com> wrote:
> On 20/05/16 00:34, Rob Clark wrote:
>> On Thu, May 19, 2016 at 6:21 PM, Eric Anholt <e...@anholt.net> wrote:
>>> Rob Clark <robdcl...@gmail.com> writes:
>>>> So some rendering patterns that I've seen in apps turn out to be
>>>> somewhat evil for tiling gpu's.. couple cases I've seen:
>>>> 1) stk has some silliness where it binds an fbo, clears, binds other
>>>> fbo clears, binds previous fbo and draws, and so on. This one is
>>>> probably not too hard to just fix in stk.
>>>> 2) I've seen a render pattern in manhattan where app does a bunch of
>>>> texture uploads mid-frame via a pbo (and then generates mipmap levels
>>>> for the updated texture, which hits the blit path which changes fb
>>>> state and forces a flush). This one probably not something that can
>>>> be fixed in the app ;-)
>>>> There are probably other cases where this comes up which I haven't
>>>> noticed yet. I'm not entirely sure how common the pattern that I see
>>>> in manhattan is.
>>>> At one point, Eric Anholt mentioned the idea of tracking rendering
>>>> cmdstream per render-target, as well as dependency information between
>>>> these different sets of cmdstream (if you render to one fbo, then turn
>>>> around and sample from it, the rendering needs to happen before the
>>>> sampling). I've been thinking a bit about how this would actually
>>>> work, and trying to do some experiments to get an idea about how
>>>> useful this would be.
>>> My plan was pretty much what you laid out here, except I was going to
>>> just map to my CL struct with a little hash table from the FB state
>>> members since FB state isn't a CSO.
>> ok, yeah, I guess that solves the naming conflict (fd_batch(_state)
>> sounds nicer for what it's purpose really is than
> llvmpipe is also a tiler and we've seen similar patterns. Flushing reduces
> caching effectiveness, however in llvmpipe quite often texture sampling is
> the bottleneck, and an additional flush doesn't make a huge difference.
interesting, it hadn't occurred to me about llvmpipe
> I think the internal hash table as Eric proposes seems a better first step.
> Later on we could try make framebuffer state a first class cso, but I
> suspect you'll probably want to walk internally all pending FBOs CLs anyway
> (to see which need to be flushed on transfers.)
> So first changing the driver internals, then abstract if there are
> commonalities, seems more effective way forward.
yeah, makes sense.. and I'm planning to go w/ Eric's idea to keep
fd_batch separate from framebuffer state.
It did occur to me that I forgot to think about the write-after-read
hazard case. Those need to be handled with an extra dependency
between batches too.
And at least for this particular case, I need somehow some cleverness
to discard or clone the old bo to avoid that write-after-read forcing
a flush. (Maybe in transfer_map? But I guess there are other paths..
Freedreno mailing list