Roland Scheidegger wrote: > Keith Whitwell wrote: >>> No, basically everything seems to be working. I was a bit concerned >>> with the limits when the split occurs, but that seems to work fine >>> on r200 (though probably not optimal since it's really a buffer >>> size limit not a fixed vertex count limit for the driver >>> currently). Not sure though what happens with large index counts >>> (chances are there will be a split anyway in that case because of >>> too large arrays), there may be hidden bugs there with the >>> radeon/r200 drivers. >> The split code can be set up to split on either large vertex buffers >> or large index buffers. >> >> There's no static cost associated with changing the split limits, so >> just figure out the vertex size and divide that into the maximum >> allowed vertex buffer size, and use that as the split limit. > Actually, I misremembered that. There is a vertex count limit in the > driver, not vertex size, as each attribue needs to fit into a buffer of > 65k - for 4 floats that is 4096 vertices, just above mesa's default > 3000. This was basically true even for the old code, I wanted to > increase the limit some time ago, but it wasn't really high priority and > I wanted to be sure that it actually works with 4096 (and not only up to > 4095 for instance). > >> Currently it looks like r200 is just accepting whatever limits the >> swtnl module is placing on itself, but there's no reason for this. >> What I anticipate is that r200 will register its own draw_prims >> function, which can do splitting if necessary. >> >> Then you'd ideally upload the results straight into the hardware tnl >> and never call swtnl or worry about the pipeline/whatever, only using >> that code when a fallback is required. But for the meantime, just >> feed the results into swtnl. > Yes, the driver should do something similar to what r300 is doing. It's > just that with that 4096 limit it's not too useful, ideally the driver > should be upgraded to ttm, instead of using another hack to get larger > buffers. > >>> I've done some quick measurements, ut2k3 got up from ~40fps to >>> ~75fps (which is very nice, this probably brings it roughly back to >>> where it once was in mesa 5 days if not better, even sw tnl got a >>> big increase from 27fps to 49fps!), ipers (with lod 1) has about a >>> 20% performance drop whereas trispd (with size 5) gets down from >>> roughly 25M tris/s to 16M tris/s (obviously due to the removal of >>> the vtxfmt code in the driver). I'd say that's quite good overall. >> This is great. At very least it means that we don't have to re-write >> the vtxfmt stuff before this code can be merged to the trunk. I'm >> not sure where the speedups come from, but I'm not arguing about it >> either. > I'd guess it has to do with the _ae_loopback_array_elt function not > being called anymore, the single-vertex-emit fallback really had a > high-impact (mostly because of the too large arrays) - for some reason > this fallback seemed faster in mesa 5, but with vbo it's gone (memcpy > shows up rather prominently with oprofile now...). I'd have thought > though that with swtnl it would have been more math limited rather than > just limited by function calls...
It's possible that the little vertex cache in the split_copy routines helps in this case as well. I wonder if disabling that makes the speedup go away. Keith ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev