Keith Whitwell wrote:
>> No, basically everything seems to be working. I was a bit concerned
>>  with the limits when the split occurs, but that seems to work fine
>>  on r200 (though probably not optimal since it's really a buffer 
>> size limit not a fixed vertex count limit for the driver 
>> currently). Not sure though what happens with large index counts 
>> (chances are there will be a split anyway in that case because of 
>> too large arrays), there may be hidden bugs there with the 
>> radeon/r200 drivers.
> 
> The split code can be set up to split on either large vertex buffers 
> or large index buffers.
> 
> There's no static cost associated with changing the split limits, so
>  just figure out the vertex size and divide that into the maximum 
> allowed vertex buffer size, and use that as the split limit.
Actually, I misremembered that. There is a vertex count limit in the
driver, not vertex size, as each attribue needs to fit into a buffer of
65k - for 4 floats that is 4096 vertices, just above mesa's default
3000. This was basically true even for the old code, I wanted to
increase the limit some time ago, but it wasn't really high priority and
I wanted to be sure that it actually works with 4096 (and not only up to
4095 for instance).

> Currently it looks like r200 is just accepting whatever limits the 
> swtnl module is placing on itself, but there's no reason for this. 
> What I anticipate is that r200 will register its own draw_prims 
> function, which can do splitting if necessary.
> 
> Then you'd ideally upload the results straight into the hardware tnl 
> and never call swtnl or worry about the pipeline/whatever, only using
>  that code when a fallback is required.  But for the meantime, just 
> feed the results into swtnl.
Yes, the driver should do something similar to what r300 is doing. It's
just that with that 4096 limit it's not too useful, ideally the driver
should be upgraded to ttm, instead of using another hack to get larger
buffers.

>> I've done some quick measurements, ut2k3 got up from ~40fps to 
>> ~75fps (which is very nice, this probably brings it roughly back to
>>  where it once was in mesa 5 days if not better, even sw tnl got a 
>> big increase from 27fps to 49fps!), ipers (with lod 1) has about a 
>> 20% performance drop whereas trispd (with size 5) gets down from 
>> roughly 25M tris/s to 16M tris/s (obviously due to the removal of 
>> the vtxfmt code in the driver). I'd say that's quite good overall.
> 
> This is great.  At very least it means that we don't have to re-write
>  the vtxfmt stuff before this code can be merged to the trunk.  I'm 
> not sure where the speedups come from, but I'm not arguing about it 
> either.
I'd guess it has to do with the _ae_loopback_array_elt function not
being called anymore, the single-vertex-emit fallback really had a
high-impact (mostly because of the too large arrays) - for some reason
this fallback seemed faster in mesa 5, but with vbo it's gone (memcpy
shows up rather prominently with oprofile now...). I'd have thought
though that with swtnl it would have been more math limited rather than
just limited by function calls...

Roland

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to