On Thu, 2008-06-12 at 17:17 +0200, Thomas Hellström wrote: > Eric Anholt wrote: > > We're getting close to ready to mark GEM on Intel as done. We've got > > one failing testcase that we isolated this week with interrupt handling, > > and we've got a fix in testing that appears to be doing the job. > > > > Tomorrow I'm planning on merging the GEM code to master of all 3 > > repositories. At that point, I'll cut a branch called drm-ttm in drm > > with the existing interface and support. After that I'm planning on the > > following changes: > > > > 1) Remove TTM code from i915. > > We're going forward with GEM and not going to support two memory > > managers in our driver. GEM's got the features we need and equivalent > > performance, with a simpler implementation. > > > This is going to hurt badly, since IMHO the GEM implementation is quite > far from TTM in terms of performance > and stability: Again some benchmarks with a i915 dual-channel 3GHz > celeron. I've also, for reference added in timings of a (polling only) > CX700 Via C7 at 1.5GHz running single-channel on untiled memory. > In all of these timings, reportdamage was turned off since with it, the > X server tends to consume a lot of CPU, confusing the figures somewhat. > > *GEARS* (Should be GPU bound) > i915tex (TTM): 1035fps @ 70% CPU > GEM, no buffer reuse: 863fps @ 95% CPU > GEM, buffer reuse: 1000fps @ 80% CPU > Unichrome CX700 1009fps @ 70% CPU > > *Openarena +exec anholt @640x480* (Should be GPU bound. Not on the > CX700, though). > i915tex (TTM): 48fps, 17.1user, 2.6 system > GEM, no buffer reuse 30.4fps, 17.0 user, 8.0 system (Occasional > stutter) > GEM, buffer reuse 42.5fps, 16.0 user, 4.7 system (Occasional > stutter) > Unichrome CX700 24.5fps, 39.0 user, 3.0 system > > *"Ipers" without help screen to avoid sw rendering:* (CPU bound) > i915tex (TTM): 250000 polys / s. > GEM, no buffer reuse 141000 polys / s. (Severe text rendering errors) > GEM, buffer reuse 146000 polys / s. (Severe text rendering > errors) > Unichrome CX700 205000 polys /s.
50% of the CPU in fallback rendering for glBitmap. That software fallback is also apparently incorrect, resulting in your text rendering errors. > *"Teapot" without help screen to avoid sw rendering:* (Might be GPU > bound on i915tex, CPU-bound on others) > i915tex (TTM): 74 fps > GEM, no buffer reuse: 23.8 fps (Severe text rendering > errors) > GEM, buffer reuse 24.5 fps (Severe text rendering > errors) > Unichrome CX700 36 fps Same, just sw fallback. > *"texdown" processor -> GPU transfer benchmark. Teximage and texsubimage.* > i915tex (TTM): 1360 MB /s, 970 MB /s > GEM, no buffer reuse: 556 MB / s, 138 MB/s (Severe text rendering > errors) > GEM, buffer reuse: 556 MB / s, 139 MB/s (Severe text > rendering errors) > Unichrome CX700 690 MB / s. 658 MB / s. I still haven't had time to analyze texdown, sorry. I just tried mplayer -vo gl, since people say that's what texdown's supposed to be modeling, but it was at 6% cpu, and only 2% of that 6% was in TexImage. I'm really having a hard time caring until someone comes up with something other than a microbenchmark that has issues with teximage performance. > I guess these benchmarks speak for themselves. Obviously, with more GPU > bound applications, the various Intel flavours should perform quite > equal, as long as the GPU is continously fed new data. If this was a test of just two memory manager implementations, the benchmarks would speak for themselves. However, there are at least two driver changes I caught on first review of gallium-i915-current's i915simple (which I assume is what you were testing, given that the last tests I've heard from you guys were using that) that would have an impact on performance: 1) lost_context is #if 0ed out. This is important for multiple-client stability, as it sets up the hardware in case someone else uses it. If I disable it on master, I get the following with OA on a 945G (3Ghz P4): enabled: 74.1 fps 11.65s user 2.92s system 95% cpu 15.200 total 74.0 fps 11.44s user 3.10s system 96% cpu 15.142 total 73.9 fps 11.32s user 3.07s system 94% cpu 15.166 total disabled: 76.0 fps 11.24s user 2.90s system 95% cpu 14.844 total 76.1 fps 11.36s user 2.95s system 95% cpu 14.952 total 76.0 fps 11.55s user 2.79s system 96% cpu 14.908 total 2) indirect vertex buffers are used instead of immediate. I'm hoping this has a big effect. We spend a lot of CPU time in GEM on i915 in submitting our batch buffers, and it's almost all due to running out of space when emitting vertices. To try to simulate what not blowing out the batchbuffer and making excessive trips to the kernel would mean, I tried increasing the batch size 4x: 16kb: 74.1 fps 11.65s user 2.92s system 95% cpu 15.200 total 74.0 fps 11.44s user 3.10s system 96% cpu 15.142 total 73.9 fps 11.32s user 3.07s system 94% cpu 15.166 total 64kb: 78.6 fps 11.22s user 2.62s system 95% cpu 14.507 total 78.7 fps 11.24s user 2.59s system 95% cpu 14.494 total 78.6 fps 11.26s user 2.61s system 95% cpu 14.547 total However, increasing batch size has a negative reinforcement, as we'll eat extra cost in clflushing the whole object on small batch buffers (blits, for example, end up in their own little buffer). The plan has been to do a smarter pwrite that either 1) does the copy+clflush in a single mapping of the page affected, which would also limit clflushing to just the area affected or 2) writes through the gtt when the object happens to already have a gtt mapping. All tests were done by running 4 iterations of the openarena demo and throwing out the first. > Apart from the span rendering problems above, the "subtexrate" test > rendered very incorrectly with GEM and the X server died on multiple > occasions when doing quick "konsole" scrolling. The text rendering > errors were not seen on a i945 laptop. > > My humble suggestion is that these performance-, rendering and > stability > problems are addressed before a merge. Hopefully the USER_INTERRUPT fix I just pushed addresses the stability problem. It fixed our demo loop hangs in an overnight test here. -- Eric Anholt [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED]
signature.asc
Description: This is a digitally signed message part
------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php
-- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel