Mark Vojkovich wrote:
On Tue, 2 Mar 2004, Sottek, Matthew J wrote:

 It's currently global because the hardware I work on doesn't
have to fall back to software very often.  Bookkeeping on a per-
surface basis is a simple modification and one I will add.  This
precludes using XAA2 with hardware that doesn't support concurrent
SW and HW access to the framebuffer, but that's OK since that
stuff is old and we're trying to move forward here.  HW that sucks
can use the old XAA.

It shouldn't preclude this from working. You just need the call to look like sync(xaa_surface_t *surface) and let old hardware sync the whole engine regardless of the input. It helps those who can use it and is the same as what you have now for everyone else.

I don't understand your reasoning.


The difference with per-surface as opposed to global sync state is that you don't have to sync when CPU rendering to a surface that
has no previously unsynced GPU rendering. The point of this is
to ALLOW concurrent CPU and GPU rendering into video ram except
in the case where both want to render to the same surface. There
are old hardware that allow no concurrent CPU and GPU rendering
at all.


Even with Sync() passing the particular surface which is necessitating
the sync, I would expect all drivers to be syncing the whole chip
without caring what the surface was. Most hardware allow you to
do checkpointing in the command stream so you can tell how far
along the execution is, but a Sync can happen at any time. Are
you really going to be checkpointing EVERY 2D operation?

Not every operation, but every "few" operations. For example, the Radeon 3D driver has a checkpoint at the end of each DMA buffer. It's more coarse grained than every operation, but it's much finer grained than having to wait for the engine to idle.


I still contend that it would be a benefit to know how many
rects associated with the same state are going to be sent
(even if you send those in multiple batches for array size
limitations) this allows a driver to batch things up as it
sees fit.

I don't know the amount of data coming. The old XAA (and
cfb for that matter) allocated the pathelogical case: number
of rects times number of clip rects. It doesn't know how many
there are until it's done computing them, but it knows the
upper bounds. I have seen this be over 8 Meg! The new XAA
has a preallocated scratch space (currently a #define for the size). When the scratch buffer is full, it flushes it out to
the driver. If XAA is configured to run with minimal memory,
the maximum batch size will be small.

That sounds reasonable. That's basically how the 3D drivers work.



_______________________________________________ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel

Reply via email to