Keith Whitwell wrote:

Thomas Hellstr�m wrote:

Hi, list!

With display cards that have more and more hardware on them, (TV-capture, mpeg decoders) etc. that can work independently of oneanother, but share the same DMA engine I've find the need for more than one hardware lock.


The first question is - have you found that lock contention is actually a problem?

I've done a simple implementation for the mpeg decoder of the via driver, but that one doesn't cover the DMA case. The question arises "Why should I need to wait for DMA quiescent to check whether the decoder is done with a frame, if there is no decoder data in any of the pending DMA buffers"?


But this question isn't really answered by having multiple locks - it sounds more like you want some sort of IRQ notification or timestamping mechanism. Under normal circumstances grabbing the lock doesn't mean waiting for DMA quiescence.

The typical case here:

I want a DRI client to flip a video frame to screen, using a hardware entity called the HQV. This is a rather time critical operation. To do this I have to take the hardware lock.

While this is happening, another thread is waiting for the mpeg decoder to complete a frame. To to that, this thread needs to take the hardware lock, wait for quiescent DMA, and then wait for the mpeg decoder to signal idle. It might be that the DMA command queue does not even contain mpeg data. This waiting delays the frame flip enough to create a visible jump in the video.

With multiple locks:

The first thread checks the HQV lock, it is available and frame flipping is done immediately.

The other thread meanwhile takes the MPEG engine lock, waits until the DMA engine has processed all MPEG commands in the command queue and then waits for the MPEG engine to be idle. DMA might still be processing 3D commands.

In the VIA / Unichrome case alone there is a need for even more such locks for different parts of the chip if one were to make a clean implementation of drivers for all features that are on the chip.

My idea would be to extend drm with options for multiple locks, and I suspect not only VIA cards could benefit from this. I was thinking of.


For many cards, there is a single dma-driven command queue, and the lock is used to protect that queue. All sorts of stuff (video, 2d, 3d) is delivered on the same queue. It sounds like the VIA driver follows a similar single-queue model.

Yes.

1. A separate sarea to contain these locks, to avoid messing up the current sarea with binary incompatibilities as a consequence.
2. Other kernel modules should be able to take and release these locks. (V4L for example).
3. Each DMA buffer is marked (or in the VIA case, each submission to the ring-buffer is marked) wether it accesses the resource that is protected by a certain lock.
4. A resource will become available to a client when the client has taken the lock and there are no pending DMA buffers / parts of buffers that are marked touching this resource.
5. The client is responsible for reinitializing the resource once the lock is taken.


But it still sounds like there is a single ring buffer, right? Won't you need a lock to protect the ringbuffer? Won't everything have to grab that lock?

Only while submitting command buffer data. This will hopefully be a very fast operation. The IOCTL copying this data to the ringbuffer will check that all locks are held for the hardware entities that the submitted command batch touches. The user will have to tell the IOCTL which entities these are, or possibly the command verifier checks this but I consider that an overkill since that is more of a bug-check than a security check.

Also, how does direct framebuffer access work? The X server presumably now has to grab all of the locks, and likewise 3d fallbacks, to prevent all access to the framebuffer?

The current heavyweight lock will protect framebuffer areas that are not managed by the drm memory manager. The IOCTL checking dma submission will check that this lock is held for 3d and 2d engine command submissions that touch this area. This will guarantee compatibility with the current DRI locking mechanism. Still, it will be possible to submit, for example. mpeg data to the command queue without taking the heavywieght lock, or submit 2d engine commands that blits one off-screen mpeg frame-buffer to another. These operations should not need to wait for software rendering into other parts of the frame-buffer.

These are just initial thoughts. Is there a mechanism for this in DRM today or could
it be done in a better way?


I guess I'm not sure which problem you're trying to solve. There are a couple I can think of so I'll list them here:

    - Lock contention.  Under what circumstances?

- Unnecessary flushing of the DMA queue/ringbuffer. IE. If you want to write to/read from a surface in video ram, how do you know when the video card has finished with it?

    - Something else?

Keith


I hope I described the problem and the proposed way to solve it. Further comments are appreciated.

/Thomas




------------------------------------------------------- This SF.Net email is sponsored by: Sybase ASE Linux Express Edition - download now for FREE LinuxWorld Reader's Choice Award Winner for best database on Linux. http://ads.osdn.com/?ad_idU88&alloc_id065&op=click -- _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to