>>>>>>> But giving the CS IOCTL an option for directly specifying the BOs
>>>>>>> instead of a BO list like Marek suggested would indeed save us some time
>>>>>>> here.
>>>>>> interesting, I always follow how to improve our cs ioctl, since UMD
>>>>>> guys aften complain our command submission is slower than windows.
>>>>>> Then how to directly specifying the BOs instead of a BO list? BO handle
>>>>>> array from UMD? Could your guys describe more clear? Is it doable?
>>>>> Making the BO list part of the CS IOCTL wouldn't help at all for the
>>>>> close source UMDs. To be precise we actually came up with the BO list
>>>>> approach because of their requirement.
>>>>> The biggest bunch of work during CS is reserving all the buffers,
>>>>> validating them and checking their VM status.
>>>> Totally agree. Every time when I read code there, I often want to
>>>> optimize them.
>>>>> It doesn't matter if the BOs come from the BO list or directly in the CS
>>>>> IOCTL.
>>>>> The key point is that CS overhead is pretty much irrelevant for the open
>>>>> source stack, since Mesa does command submission from a separate thread
>>>>> anyway.
>>>> If irrelevant for the open stack, then how does open source stack handle
>>>> "The biggest bunch of work during CS is reserving all the buffers,
>>>> validating them and checking their VM status."?
>> Command submission on the open stack is outsourced to a separate user space
>> thread. E.g. when an application triggers a flush the IBs created so far are
>> just put on a queue and another thread pushes them down to the kernel.
>> I mean reducing the overhead of the CS IOCTL is always nice, but you usual
>> won't see any fps increase as long as not all CPUs are completely bound to
>> some tasks.
>>>> If open stack has a better way, I think closed stack can follow it, I
>>>> don't know the history.
>>> Do you not use bo list at all in mesa? radv as well?
>> I don't think so. Mesa just wants to send the list of used BOs down to the
>> kernel with every IOCTL.
> The CS ioctl actually costs us some performance, but not as much as on
> closed source drivers.
> MesaGL always executes all CS ioctls in a separate thread (in parallel
> with the UMD) except for the last IB that's submitted by SwapBuffers.

... or by an explicit glFinish or glFlush (at least when the current
draw buffer isn't a back buffer) call, right?

> For us, it's certainly useful to optimize the CS ioctl because of apps
> that submit only 1 IB per frame where multithreading has no effect or
> may even hurt performance.

Another possibility might be flushing earlier, e.g. when the GPU and/or
CS submission thread are idle. But optimizing the CS ioctl would still
help in that case.

Finding good heuristics which allows better utilization of the GPU / CS
submission thread and doesn't hurt performance in any scenario might be
tricky though.

