Re: [Flightgear-devel] More 3D cloud changes

Yon Uriarte Mon, 15 Dec 2008 07:57:58 -0800

Hi,

On Mon, Dec 15, 2008 at 10:20 AM, Tim Moore <timo...@redhat.com> wrote:

> Yon Uriarte wrote:
> > Hi,
> >
> > On Sun, Dec 14, 2008 at 6:12 PM, Heiko Schulz <aeitsch...@yahoo.de
> > <mailto:aeitsch...@yahoo.de>> wrote:
> >
> >
> >     My last cloudset has fewer but larger sprites. I did this because I
> >     noticed that it saves a lot of fps. On the forum Gijs noticed the
> >     same with my cloudset.
> >
> >     Mores sprites means more alpha and more textures which is always a
> >     fps-killer.
> No, the cost of blending is proportional to the area blended on the screen,
> so a
> few large sprites vs. many small sprites should cost about the same. It is
> true
> that it takes longer to sort a large number of sprites, but I'm not worried
> about the sorting cost at this point.

 I am not too worried about the cost of sorting. I also (like you did)
tested doing a full sort each
frame (saving the matrix*vector in a vector<double,int>, sorting on the
saved double and
rendering on the index), something like this (not full patch):

CloudShaderGeometry.hxx:
        struct sort_vector_element {
            double dist;
            int idx;
        };
        typedef std::vector<sort_vector_element> sort_vector;
        sort_vector _sortArray;

CloudShaderGeometry.cxx:
    struct svp {
        bool operator()(const CloudShaderGeometry::sort_vector_element& a,
const CloudShaderGeometry::sort_vector_element& b) {
            return a.dist < b.dist;
        }
    };
    ...
    sort_vector svt;
    svt = _sortArray;

    if (skip_info->skip_count == 0)
    {
        osg::Matrix vm = state.getModelViewMatrix();

        // Transform the viewing direction, represented by the eye space
vector (0,0,-1, 0), into model-space
        // (here we simply take the opposite direction and reverse the
ordering when sorting)
        osg::Vec3f view_dir(vm(0, 2), vm(1, 2), vm(2, 2));      // Caveat:
OpenSceneGraph matrices are transposed!

        for(unsigned int i = 0; i < _cloudsprites.size(); i++) {
            sort_vector_element& sar = svt[i];
            sar.dist = view_dir*_cloudsprites[i]->position.osg();
            sar.idx = i;
        }
        std::sort( svt.begin(), svt.end(), ssvp );
    }
    ...
    for(unsigned int i = 0; i < _sortArray.size(); i++)
    {
        unsigned int idx = svt[i].idx;
        sorted = sorted && idx == _sortArray[i].idx;
        const CloudSprite& t = *_cloudsprites[idx];
        GLfloat ua1[3] = { (GLfloat) t.texture_index_x/varieties_x,
(GLfloat) t.texture_index_y/varieties_y, t.width };
        ...
    }
    ...
    The high (really? not measured) math cost is the matrix*vector, which we
are doing on the full cloud vector
every time we do a bubble sort anyway. Might as well save it and do a quick
(hopefully) quicksort.
I suspect the rendering times are really increased by calls to glSomething.
We are resending
all the vertices every frame, incurring in a huge number of glSomeFunction
calls.

 I've got a patch for trees using VBOs, performance gains are impressive,
allowing
thight forests with "coverage/=10" (obj.cxx) (before hitting the win32 3G
limit in around 80 tiles, lol).
 My idea was to adapt that VBO patch to cloud rendering, using index
buffers. The
vertex-packs (v,n,tc) are static and the index buffers have to be
recalculated
every frame (as in the quicksort above). Ideally one may just calculate
a static vertex-pack buffer for the whole cloud layer after it's generated
and either just draw indexed on that one huge vbo per cloud, or even better,
collect the indexes for each cloud, without drawing anything,
and then in a posterior renderbin draw that whole-layer indexbuffer
in a single rendercall (well, more than one glFunc, mapping the
vbos takes some calls). If I arrive at some results I'll post a patch :) im
doing
more than one thing atm.

> >     I'm pretty sure that there is another solution looking to X-plane
> >     which seems to use the same technic for their clouds.
> >
> >
> >
> >  Fewer bigger sprites with few parts of the sprite with  0 < alpha < 1.
> > One opaque pre-pass drawing front to back with
> > alpha culling != 1 and blending off to fill up z-buffer then a second
> > pass back to front with alpha culling < 1 and blending on.
> > Second pass gets massively z-culled, saving on rop blending bandwidth.
> > First pass gets also z-culled by itself.
> >
> This is a good idea if, in fact, the cloud sprites are mostly opaque. I
> think we
> are doing various tricks where the sprites end up being completely
> transparent.

 Yup, rendering transparencies back to front incurs in massive overdraw.
Even if most of
the rendered cloud is either opaque or fully transparent we are
write-hitting the color and
z-buffers a proportionally (sp?) high number of times. And for 0<alpha<1
it's even worse,
a full read-write cycle to GPU ram.

 One may hope that the GPUs are smart enought to auto-not-blend on alpha==0
or
alpha==1, I guess im an optimist. One of my older patches was not so
optimistic and
did an opaque pre-pass. At least the alpha < 0+epsilon fragment culling is
taken care
of with the alphafunc.

>
> Also, I don't think that the blending stage is a bottleneck for us at the
> moment.

  Yeah, at least not here (3870, 10x overkill for FG).
  I maximize the window from default (800x600?) to aprox
1600x1200 and the frame rate stays constant. This rules out texture-, fill-
or blend-rate
bottlenecks. I seriously doubt we are vertex bottlenecked (to state the
obvious).

  I must think it is the application<->(API<->)GPU communication that is
bottlenecking
the cloud rendering. My trees experiments (i hate ATI(not so much as nv ;))
suggest
this is the case. BTW, nice code there, the QuadTreeBuilder is both elegant
and cost me
some sanity ;)

  I'll see if i can clean up those vbos patches for others to see. I think
they are not multi-GPU
safe atm. If i configure multiple-windows: is it a good multi-gpu
simulation? (multi-context?)
Tim, you had an excessive # of GPUs, i believe? :)

>
> >  It's a tradeoff between cpu and gpu fill and blend rate, whatever is
> > your bottleneck.
> > Sadly, that changes between machines, so the opaque pre-pass should be a
> > configurable.
> Tim
>
>
greetings,
 yon

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/

_______________________________________________
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel

Re: [Flightgear-devel] More 3D cloud changes

Reply via email to