FYI -------- Original Message -------- Subject: {DirectFB} Core: Experimental gfxcard.c replacement code with huge optimization Date: 13 Nov 2011 14:07:05 +0100 From: d...@directfb.org To: directfb-...@directfb.org
New branch 'accel1' available with the following commits: http://git.directfb.org/?p=core/DirectFB.git;a=commit;h=92c15f0612c55df9b95a83fb0d19804e4718fa48 commit 92c15f0612c55df9b95a83fb0d19804e4718fa48 Author: Denis Oliver Kropp <d...@directfb.org> Date: Mon Oct 31 23:44:07 2011 +0100 Core: Experimental gfxcard.c replacement code with huge optimization This implementation does not lock/unlock buffers for each operation, but does a lazy state and lock management. If one calls SetColor and FillRectangles in a row it will only call SetState() of the driver, no unlock/lock of buffers etc. To achieve this there's a new dispatch cleanup handler added that is called before the next read() from Fusion device, to unlock the currently locked buffers and exit the currently active state. There's still a lot of code to move from gfxcard.c, the actual rendering, but maybe it's worth to think about a rework to support all kinds of cases with/without hardware matrix/clipping, different primitives etc. The performance boost is awesome, up to 20x for some tests I ran. Here are a few results: Benchmarking 100x100 on 852x464 RGB32 (32bit)... Anti-aliased Text 3.001 secs ( 3637.187 KChars/sec) [ 19.6%] Anti-aliased Text (blend) 3.015 secs ( 489.552 KChars/sec) [ 4.9%] Fill Rectangle 3.003 secs ( 3303.030 MPixel/sec) [ 24.0%] Fill Rectangle (blend) 3.066 secs ( 181.343 MPixel/sec) [ 1.6%] Fill Rectangles [10] 3.018 secs ( 3479.125 MPixel/sec) [ 4.6%] Fill Rectangles [10] (blend) 3.351 secs ( 182.035 MPixel/sec) [ 0.2%] Blit 3.005 secs ( 3346.422 MPixel/sec) [ 16.3%] Blit 180 3.014 secs ( 1379.230 MPixel/sec) [ 7.9%] Blit colorkeyed 3.015 secs ( 1271.973 MPixel/sec) [ 7.3%] Blit destination colorkeyed 3.012 secs ( 1403.054 MPixel/sec) [ 7.9%] Blit with format conversion 3.059 secs ( 322.000 MPixel/sec) [ 1.9%] Blit with colorizing 3.061 secs ( 189.807 MPixel/sec) [ 1.9%] Blit from 32bit (blend) 3.059 secs ( 323.635 MPixel/sec) [ 1.9%] Blit from 32bit (blend) with colorizing 3.126 secs ( 86.372 MPixel/sec) [ 0.9%] Blit SrcOver (premultiplied source) 3.037 secs ( 526.506 MPixel/sec) [ 3.3%] Blit SrcOver (premultiply source) 3.035 secs ( 548.599 MPixel/sec) [ 3.3%] Compared to the old code: Benchmarking 100x100 on 852x464 RGB32 (32bit)... Anti-aliased Text 3.009 secs ( 926.021 KChars/sec) [ 18.0%] Anti-aliased Text (blend) 3.015 secs ( 427.462 KChars/sec) [ 4.9%] Fill Rectangle 3.010 secs ( 655.813 MPixel/sec) [ 40.5%] Fill Rectangle (blend) 3.069 secs ( 171.391 MPixel/sec) [ 2.2%] Fill Rectangles [10] 3.019 secs ( 3093.739 MPixel/sec) [ 5.6%] Fill Rectangles [10] (blend) 3.326 secs ( 180.396 MPixel/sec) [ 0.3%] Blit 3.037 secs ( 466.249 MPixel/sec) [ 6.6%] Blit 180 3.051 secs ( 406.751 MPixel/sec) [ 5.5%] Blit colorkeyed 3.046 secs ( 397.570 MPixel/sec) [ 5.2%] Blit destination colorkeyed 3.030 secs ( 571.287 MPixel/sec) [ 8.2%] Blit with format conversion 3.079 secs ( 220.850 MPixel/sec) [ 2.2%] Blit with colorizing 3.072 secs ( 131.510 MPixel/sec) [ 2.2%] Blit from 32bit (blend) 3.097 secs ( 188.246 MPixel/sec) [ 2.2%] Blit from 32bit (blend) with colorizing 3.136 secs ( 77.487 MPixel/sec) [ 0.9%] Blit SrcOver (premultiplied source) 3.078 secs ( 253.411 MPixel/sec) [ 2.9%] Blit SrcOver (premultiply source) 3.068 secs ( 265.319 MPixel/sec) [ 2.9%] Compared to new code, but running as master (new mechanism leverages async FusionCalls): Benchmarking 100x100 on 852x464 RGB32 (32bit)... Anti-aliased Text 3.000 secs ( 1582.800 KChars/sec) [ 99.3%] Anti-aliased Text (blend) 3.003 secs ( 402.797 KChars/sec) [ 99.6%] Fill Rectangle 3.000 secs ( 1978.000 MPixel/sec) [ 99.6%] Fill Rectangle (blend) 3.001 secs ( 172.609 MPixel/sec) [ 99.6%] Fill Rectangles [10] 3.002 secs ( 3214.523 MPixel/sec) [ 99.6%] Fill Rectangles [10] (blend) 3.049 secs ( 180.387 MPixel/sec) [ 99.6%] Blit 3.001 secs ( 522.159 MPixel/sec) [ 99.3%] Blit 180 3.000 secs ( 424.333 MPixel/sec) [ 99.6%] Blit colorkeyed 3.002 secs ( 413.724 MPixel/sec) [ 99.3%] Blit destination colorkeyed 3.001 secs ( 615.794 MPixel/sec) [ 99.3%] Blit with format conversion 3.000 secs ( 225.333 MPixel/sec) [ 99.6%] Blit with colorizing 3.003 secs ( 143.856 MPixel/sec) [ 99.6%] Blit from 32bit (blend) 3.002 secs ( 207.861 MPixel/sec) [ 99.3%] Blit from 32bit (blend) with colorizing 3.006 secs ( 74.184 MPixel/sec) [ 99.6%] Blit SrcOver (premultiplied source) 3.001 secs ( 274.908 MPixel/sec) [ 99.0%] Blit SrcOver (premultiply source) 3.000 secs ( 286.333 MPixel/sec) [ 99.6%] YES, it is slower than as a slave, as master does not go via FusionCall! lib/fusion/fusion.c | 68 +++++ lib/fusion/fusion.h | 19 ++ lib/fusion/fusion_internal.h | 2 + src/core/CoreGraphicsState_real.cpp | 465 +++++++++++++++++++++++++++++++++- src/core/gfxcard.c | 2 +- src/core/graphics_state.h | 18 +- src/core/state.h | 6 +- src/gfx/clip.h | 2 +- src/gfx/generic/generic.c | 197 +++++++++++----- src/gfx/generic/generic.h | 3 +- 10 files changed, 704 insertions(+), 78 deletions(-) http://git.directfb.org/?p=core/DirectFB.git;a=commit;h=364bbdb150032eed9d69e5a0acfdd6f976a38770 commit 364bbdb150032eed9d69e5a0acfdd6f976a38770 Author: Denis Oliver Kropp <d...@directfb.org> Date: Sun Nov 13 13:49:45 2011 +0100 Core: Use new "queue" property for rendering and state setting methods. Except surface setters because of out of order execution with references being dropped right after blitting from but before flushing. src/core/CoreGraphicsState.flux | 24 ++++++++++++++++++++++++ 1 files changed, 24 insertions(+), 0 deletions(-) _______________________________________________ directfb-cvs mailing list directfb-...@directfb.org http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-cvs _______________________________________________ directfb-dev mailing list directfb-dev@directfb.org http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev