Hello, I have some ideas about tuning antialiased rendering of my pipeline, because the current approach (XPutImage of alpha-data to pixmap) has some performance drawbacks. The problems are: - The pipeline uses XPutImage, even if SHM is available - forcing image data over (local) sockets. This however has the advantage of not having to XSync. - XLib based on XCB uses a 4kb buffer, so after only a few alpha-tiles (4-10), the buffer is flushed. I see a high context-switch rate (15.000cs/s) when running the line-anim demo in antialiased mode. - Composition of small tiles (32x32 and smaller) is quite slow because of high-primitive overhead (this hopefully will change soon, can't do much about that)
I cannot simply switch to SHM because that would (as far as I have understand) require synchronization with the X-Server. I did some benchmarks using SHM and performance was a lot worse than without, because the advantage of not having to copy image data was completly invalidated by the cost of the round-tip. SHM could be an advantage, used like this: - Allocate a large number of alpha-tile size pixmaps (128 or even more) - Fill those alpha tiles one after another without syncing - Sync and composite when the mask-fill operation is done. Depending on the values it could be decided wether using SHM would be beneficial (e.g. large antialiased shapes), or wether using XPutImage would be better. However this would require some more information from the tile generator, like the number of total tiles and number of total tiles This information could be devlivered to the pipeline either through MaskFill/MaskBlit methods for every tile, or with an initialization method like initMaskBlit. Maybe a bit more advanced would be the possibility of buffering MaskFills/MaskBlits until those 128+ alpha pixmaps are all used and only flush when really needed, but I don't know how this could be implemented. I guess a fair amount of tracking would be needed to make that possible. Do you think passing down more information from the Tile-Generators could be a bad idea? Do you think other pipelines could benefit from such optimizations too? How important is good AA performance considered? Thank you in advance, Clemens
