Roland Scheidegger wrote:
> Keith Whitwell wrote:
>>> No, basically everything seems to be working. I was a bit concerned
>>>  with the limits when the split occurs, but that seems to work fine
>>>  on r200 (though probably not optimal since it's really a buffer 
>>> size limit not a fixed vertex count limit for the driver 
>>> currently). Not sure though what happens with large index counts 
>>> (chances are there will be a split anyway in that case because of 
>>> too large arrays), there may be hidden bugs there with the 
>>> radeon/r200 drivers.
>> The split code can be set up to split on either large vertex buffers 
>> or large index buffers.
>>
>> There's no static cost associated with changing the split limits, so
>>  just figure out the vertex size and divide that into the maximum 
>> allowed vertex buffer size, and use that as the split limit.
> Actually, I misremembered that. There is a vertex count limit in the
> driver, not vertex size, as each attribue needs to fit into a buffer of
> 65k - for 4 floats that is 4096 vertices, just above mesa's default
> 3000. This was basically true even for the old code, I wanted to
> increase the limit some time ago, but it wasn't really high priority and
> I wanted to be sure that it actually works with 4096 (and not only up to
> 4095 for instance).
> 
>> Currently it looks like r200 is just accepting whatever limits the 
>> swtnl module is placing on itself, but there's no reason for this. 
>> What I anticipate is that r200 will register its own draw_prims 
>> function, which can do splitting if necessary.
>>
>> Then you'd ideally upload the results straight into the hardware tnl 
>> and never call swtnl or worry about the pipeline/whatever, only using
>>  that code when a fallback is required.  But for the meantime, just 
>> feed the results into swtnl.
> Yes, the driver should do something similar to what r300 is doing. It's
> just that with that 4096 limit it's not too useful, ideally the driver
> should be upgraded to ttm, instead of using another hack to get larger
> buffers.
> 
>>> I've done some quick measurements, ut2k3 got up from ~40fps to 
>>> ~75fps (which is very nice, this probably brings it roughly back to
>>>  where it once was in mesa 5 days if not better, even sw tnl got a 
>>> big increase from 27fps to 49fps!), ipers (with lod 1) has about a 
>>> 20% performance drop whereas trispd (with size 5) gets down from 
>>> roughly 25M tris/s to 16M tris/s (obviously due to the removal of 
>>> the vtxfmt code in the driver). I'd say that's quite good overall.
>> This is great.  At very least it means that we don't have to re-write
>>  the vtxfmt stuff before this code can be merged to the trunk.  I'm 
>> not sure where the speedups come from, but I'm not arguing about it 
>> either.
> I'd guess it has to do with the _ae_loopback_array_elt function not
> being called anymore, the single-vertex-emit fallback really had a
> high-impact (mostly because of the too large arrays) - for some reason
> this fallback seemed faster in mesa 5, but with vbo it's gone (memcpy
> shows up rather prominently with oprofile now...). I'd have thought
> though that with swtnl it would have been more math limited rather than
> just limited by function calls...

It's possible that the little vertex cache in the split_copy routines 
helps in this case as well.  I wonder if disabling that makes the 
speedup go away.

Keith

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to