I noticed the following - XAACopyArea() only attempts to use
accelerated WriteImage() when writing to a DRAWABLE_WINDOW but not
on off-screen pixmaps. I used the following changes to make it work:

diff -u -w -r1.1.1.3 xaaCpyArea.c
- --- xaaCpyArea.c   9 Jun 2001 15:09:02 -0000
+++ xaaCpyArea.c     3 Mar 2008 20:51:05 -0000
@@ -64,9 +64,16 @@
           return (XAABitBlt( pSrcDrawable, pDstDrawable,
             pGC, srcx, srcy, width, height, dstx, dsty,
             XAADoBitBlt, 0L));
+    } else {
+        if(infoRec->ScreenToScreenBitBlt &&
+         CHECK_ROP(pGC,infoRec->ScreenToScreenBitBltFlags) &&
+         CHECK_ROPSRC(pGC,infoRec->ScreenToScreenBitBltFlags) &&
+         CHECK_PLANEMASK(pGC,infoRec->ScreenToScreenBitBltFlags))
+            return (XAABitBlt( pSrcDrawable, pDstDrawable,
+            pGC, srcx, srcy, width, height, dstx, dsty,
+            XAADoImageWrite, 0L));

This does not look correct.  Shouldn't this be more in line with
the case where the destination drawable is a window?  (i.e. test
bitsPerPixel's and WritePixmap files instead of ScreenToScreenBitBlt).

 The whole logic looks a little bit fishy, I used the first if()'s
 source-in-memory branch first but wasn't quite sure if that's doing
the right thing, where it;s now looked better to me but I won't claim
 I completely understand XAA's inner voodoo. All I want is the make
 XAA use ImageWrite()s for all RAM-to-VRAM transfers if the driver
 supports it.
 Otherwise, teaching the framebuffer layer to cope with a tiled
framebuffer might be necessary in the long run, any pointers where to

Several drivers (radeon, intel, savage) in the Xorg tree provide
support for various tiling methods.  Generally the chip provides a
surface control or aperture for exposing a tiled region to the CPU as
a linear surface.  For acceleration, you have to keep track of what
buffers are tiled in the driver and do the right thing with the
blitter when using those surfaces.

Yeah, I'm dimly aware of these things - my problem is that the hardware in question doesn't give me a linear view on the framebuffer. All I have is a small linear buffer I can use to DMA data in or out of the tiled framebuffer. The other problem is that the machine's native pixel format is RGBA, if I want 24bit colour that's the only one I can use. Fortunately the DMA engine can convert pixels on the fly so I can pretend it's ABGR. So pixels would have to be endian- flipped as well when the fb layer accesses VRAM - is there any prior art for that? So far my driver supports image writes, screen-to-screen copies, rectangle fills, solid and dashed lines, colour expansion and alpha textures. ARGB textures should work as well but I couldn't find a user for those - nothing in xfce4 or windowmaker seems to do anything with that. Another thing - I sprinkled xf86Msg()s all over XAA, sometimes XCopyArea() calls seem to end up writing to the framebuffer but not through xaaCopyArea() - any ideas?

have fun
