Marek, 

I'm fine with keeping user buffers -- it's only a vague hope they'll fade away 
over time, and I'm comfortable with keeping them as long as their behaviour is 
well understood. 

The really important thing for me is to preserve traceability. That is to say 
it should be possible to observe what happens over the interface and infer 
directly from that when something important happens. In this case, that would 
mean having a way to notice that the contents and/or size of a userbuffer 
changed. 

That could be as simple as a notification call that this has happened, for 
example "redefine_user_buffer()". On your current drivers that call would be a 
noop -- hopefully that's not going to be a noticiable performance hit? Then in 
some tracing or recording module, that call could be used to log the contents 
of the userbuffer to a file, or in some future indirect-rendering equivalent, 
the new contents could be transmitted across the wire, etc. 

This would mean that userbuffers continue to have a known size, etc, and would 
require the state-tracker to issue the redefine call as necessary. You're in a 
better position than me to comment on the performance impact of this. 

If you're saying this isn't workable from a performance POV, then as you 
suggest we'll have to find a way to push the logic for identifying when 
userbuffers changed down into those components (tracing, remoting, etc) which 
care about it. 

Keith 

----- Original Message -----
From: "Marek Olšák" <mar...@gmail.com> 
To: "Keith Whitwell" <kei...@vmware.com> 
Cc: mesa-dev@lists.freedesktop.org 
Sent: Sunday, 6 February, 2011 12:01:01 PM 
Subject: Re: [Mesa-dev] Gallium proposal: add a user pointer in pipe_resource 


Hi Keith, 

1) Recreating user buffers is very expensive, even though it's only the CALLOC 
overhead. Draw-call-heavy apps simply suffer hard there. It's one of the things 
the gallium-varrays-optim branch tries to optimize, i.e. make the user buffer 
content mutable. I can't see another way out. 


2) The map/unmap overhead is partially hidden by the fact that: 
- r300g doesn't unmap buffers when asked to, it defers unmapping until the 
command stream is flushed. This optimization has resulted in about 70% frame 
rate increase in Nexuiz. The overhead there is now mainly when locking and 
unlocking a mutex and doing some checks. 
- r600g keeps all buffers mapped all the time, even textures. The only 
disadvantage is it consumes address space. This is a result of desperation we 
have with draw-call-heavy apps. (Do you remember that I wanted to add 
spinlocks? Frankly, that was another desperate move.) 

But it's not enough. We must prevent any unnecessary calls to 
transfer_map/unmap. If keeping the upload buffer mapped a little longer results 
in 4% perfomance increase, then I want it. I have measured the real increase 
from this in Torcs and it's simply worth it. The problem with inline transfers 
is it's like map/unmap, so it wouldn't improve anything. 


3) Not sure if you noticed, but constants are now set via user buffers as well. 
IIRC, Radeon and Nouveau people welcomed this change. The thing is every driver 
uses a different approach to uploading constants and all it needs is a direct 
pointer to gl_program_parameter_list::ParameterValues to do the best job. 
Previously, drivers stored constants in malloc'd memory, which was basically 
just a temporary copy of ParameterValues. Eliminating that copy was the main 
motivation for using user buffers for constants. r300g copies the constants to 
the command stream directly, whereas r600g uses u_upload_mgr, and I guess other 
drivers do something entirely different. As you can see, we can't get rid of 
user buffers while keeping all drivers on the fast path. But I'd be ok with a 
new set_constant_buffer(data?) function which takes a pointer to constants 
instead of a resource. With that, we could remove the overhead of 
user_buffer_create for constants. The original set_constant_buffer function can 
be reserved for ARB_uniform_buffer_object, but shouldn't ideally be used for 
anything else. 


I fully understand that you want a robust interface. I would totally agree with 
you if I didn't spend months profiling Mesa. I'd like to have the same except 
that I also want it to be performance-oriented. I am afraid it will be very 
hard to have that and the robustness at the same time. I and other driver devs 
really want to compete with proprietary drivers in terms of performance. 


On Tue, Feb 1, 2011 at 6:55 PM, Keith Whitwell < kei...@vmware.com > wrote: 


So the optimization we're really talking about here is saving the 
map/unmap overhead on the upload buffer? 

And if the state tracker could do the uploads without incurring the 
map/unmap overhead, would that be sufficient for you to feel comfortable 
moving this functionality up a level? 


Because one of the keys to performance is to do as little CPU work as possible, 
I'd like the upload buffer to stay mapped as long as possible and I'd like it 
to be used for drawing when mapped. This is OK for Radeons, because the GPU can 
read some part of the buffer while some other part is being filled by the CPU. 
However, it wouldn't change the situation with regard to recording and 
replaying at all. This is one of the reasons I'd like user buffers to stay and 
I'd like them mutable at least for vertices. The eventual record/replay module 
could use the information provided by pipe_draw_info::min_index and max_index 
to know what regions of user buffers may have been changed. 




But we still must be calculating that somewhere -- the min_/max_index 
info has to come from somewhere in the statetracker. 


This info can be obtained from pipe_draw_info and that's sufficient. There is 
no reason to set pipe_vertex_buffer::max_index nor pipe_resource::width0 for 
user buffers. Neither r300g nor r600g uses this additional information. 
Computing pipe_vertex_buffer::max_index and recreating user buffers only to set 
width0, which I don't need for anything, significantly reduces performance in 
draw-call-heavy apps. For example, if I dropped this computation, there would 
be 40% performance increase with r300g in the Torcs racing game in the Forza 
track (I've measured it). Even seemingly-performance-unrelated code can have a 
huge impact. (my experience tells me I should replace "even" with "all") 


No matter how I look at this whole issue, I can't see user buffers going away 
without losses. I may try to move this functinality up one level in a branch, 
minimize the losses as much as possible (possibly keeping some buffers 
persistently mapped at the driver level, not at the state tracker level), see 
how it performs, and then decide what to do next. But I can't say now how fast 
it will be. 


Best regards 
Marek 
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to