Hi Robert,

My understanding of the way bindness graphics actually achieves its speedup is 
by not requiring CPU dereferencing of graphic object pointers and the likely L2 
cache misses this causes.

nVidia states:

OpenGL has evolved in a way that allows applications to replace many of the 
original state machine variables with blocks of user-defined data. For example, 
the current vertex state has been augmented by vertex buffer objects, 
fixed-function shading state and parameters have been replaced by 
shaders/programs and constant buffers, etc.. Applications switch between coarse 
sets of state by binding objects to the context or to other container objects 
(e.g. vertex array objects) instead of manipulating state variables of the 
context. In terms of the number of GL commands required to draw an object, this 
enables applications to be an order of magnitude more efficient. However, this 
explosion of objects bound to other objects has led to a new bottleneck - 
pointer chasing and CPU L2 cache misses in the driver, and general L2 cache 
pollution. 

Recent OpenGL graphics applications tend to change state at roughly these 
frequencies:

for  (...) { // cold
        data downloads, render target changes, etc.
        for (...) { // warm
            bind textures
            for (...) { // hot
                bind constants
                bind vertex buffers
                Draw();
            }
        }
}

The most frequent state changes are binding vertex buffer objects (every draw), 
followed closely by binding constant buffers. Vertex buffer and constant buffer 
binds are significantly more expensive than one might expect. These binds 
require several reads from the driver internal object data structure to 
accomplish what the driver actually needs to do. In an OpenGL driver, it looks 
like this:

name->obj (lookup object by name) 
obj->{refcount, GPU address, state, etc.} (dereference object to reference 
count it, to get its GPU virtual address, and validate its state). 

Each of these dereferences has a high probability of causing a CPU L2 cache 
miss due to the inherently LRU-eviction-unfriendly nature of graphics 
applications (each frame starts over at the beginning). These L2 cache misses 
are a huge bottleneck in modern drivers, and a penalty paid for every frame 
rendered.

End nVidia states.

I think these extensions address new bottlenecks created by the switch to gl3 
style vertex and constant buffers, shaders for everything, etc. It seems to be 
a graphic driver bottleneck, not a scenegraph problem. But what nVidia is doing 
is admitting the problem and is trying to provide OpenGL users a way to take 
advantage of an optimization technique.
The OpenGL extensions are GL_NV_shader_buffer_load and 
GL_NV_vertex_buffer_unified_memory.

I am not competent enough with gl3 yet to begin to implement these in code, but 
it seems it may be worth doing that at some point. I like to add quality in 
things I pursue. As I progress, I will keep in touch. For now I am a toddler.

Thank you!

Cheers,
John

------------------
Read this topic online here:
http://forum.openscenegraph.org/viewtopic.php?p=20462#20462





_______________________________________________
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Reply via email to