[Dri-devel] Re: PCI Bus and Mach64's DMA ring

Linus Torvalds Fri, 14 Jun 2002 10:08:57 -0700

On Fri, 14 Jun 2002, José Fonseca wrote:
>
> So to avoid being constantly checking for conclusion before asking to
> process new entries we devised a different scheme:
>
>   - after adding new entries to the ring
>
>   - toggle the end flag of the previous last entry, so that the engine
> will also process our just commited buffers

If this is in non-coherent memory (AGP), I hope you do an "sfence" in
between those two stages?  You also need to make sure that the compiler
hasn't re-ordered them (ie a compiler barrier() in between), regardless of
memory ordering.

I also hope you do the toggle with a locked cycle so that you don't lose
any information..

>   - check if it's idle (due to lack of timely buffer additions) and ask to
> process the remaining entries (if there is any) from the position it was
> previous stopped
>
> Although we still need to check for idle, the engine is idle much less
> times and that really makes a difference on the fps obtained in slow
> machines (+20%)

Sounds good to me.

> Although this works really very well, but on a slow machine I experience
> a lockup every once in a while. The register and memory dumps (when
> available) show that engine jumps to arbitrate positions of memory
> instead of keep reading inside the piece of memory whe supplied for the
> ring buffer. Once My suspicion is that there is some kind of race
> condition when accessing the previous last entry, but here is were my
> knowledge starts to fail:
>
>   - Are the system memory accesses by the processor and the bus serialized
> or concurrent?

Intel calls their write ordering "processor ordering", and all writes on
an intel CPU should be visible to other CPU's in the order they were done
on a logical level.

HOWEVER, that is only true for the cache coherency protocol. If some
client is not cache-coherent, or if the CPU has been told that the memory
area is not regular cacheable RAM (ie WT), , that is no longer true, and
you need to have an explicit sfence in between the writes.

>   - Has this problem ever appeared on the Linux kernel before, and how was
> it solved?

There have been some similar issues on USB, although they weren't exactly
the same.

On USB, as on the mach64, the controller DMA's the commands from memory,
and you don't want to shut down the controller just to add or remove a
command. So the USB drivers have some of the same issues.

However, the USB setup is different enough that they don't see your exact
problem. The controller will walk a circular buffer of linked lists
forever, so if you race with command insert, it will be seen the next time
around the list instead. The remove case is the much more interesting one
on USB (the USB controller might just be executing the thing that you're
trying to remove), but you don't have that problem.

The USB lists are also all in regular RAM (no AGP stuff that might turn
off the normal cache protocol), so the regular wmb() macro works fine on
it (although some architectures really want a separate "write barrier for
DMA" and consider barriers for CPU vs IO to be very different, oh well).

You might ask the USB people if they have any ideas.

>   - How does the presense of an AGP aperture or MTRRs covering that memory
> affects that access?

See above about the memory ordering.

                                Linus


_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas - 
http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink

_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel
[Dri-devel] Re: PCI Bus and Mach64's DMA ring

Reply via email to