On 30 Oct 1999, Marcus Sundberg wrote:

> "Jon M. Taylor" <[EMAIL PROTECTED]> writes:
> 
> > On Fri, 29 Oct 1999, Per Wigren wrote:
> > > and GGI using
> > > fbdev as a target is much slower than KGI.
> > 
> >     Basic fbdev drivers are unaccelerated, so this is true.  However,
> > KGIcon drivers are able to handle kgicommand ioctls through the fbdev
> > interface, and as such can be fully accelerated.
> 
> Note however that the direct acceleration you get using matroxfb
> is 4-6 times faster that the ioctl() based acceleration of the Matrox
> KGIcon driver. 

        Yes, the inherent latency of the kernel-user ring transition
really shows up with ioctls vs. direct register programming.  Command
buffering, if implemented properly, should narrow this latency gap to
negligible levels.

> On a related note - direct acceleration using KGIcon
> is also about 50% slower than direct acceleration using matroxfb,
> so something must be broken with our mmap() handler.

        Our mmap() handler sets the _PAGE_PCD (page cache disable) flag on
the regions it maps.  I'll bet that's it.  And we are doing the right
thing - this flag should _always_ be enabled for MMIO register regions.  
If matroxfb works without it, great (or lucky), but lots of other chipsets
can react very badly if their register MMIO goes through the L2 cache. The
linux-kernel archives show that this flag used to always be set for high
PCI memory region mappings, to avoid XFree86-related problems.  

        We also enable this flag for the LFB region, which is actually
_not_ necessary in most (all?) cases and probably costs us some
performance.  The one-size-fits-all solution we are using now in
kgifb_mmap() looks like needing replacement - there are too many
driver-specific special cases to be able to handle them cleanly in
fbcon-kgi.c.  Perhaps we need to start making use of the 'prot'
(protection flags) field in the kgi_mmio_region struct.  We need this to
let the driver tell fbcon-kgi.c about cache policy, MTRR policy, AGP
flags, etc etc etc on a per-region basis.

        This opens up a whole can of worms when non-x86 architectures are
considered, however.  Different combinations of CPUs, buses and chipsets
themselves will often need to be handled on a case-by-case basis.  As with
multiheading, there isn't a whole lot of commonality of rules here that
would allow us to abstract this stuff away from the driver.  It
degenerates quickly into a mess of special cases |-<.  I dislike the idea
of pushing this knowledge down into the drivers and bloating them with a
bunch of platform-dependent #ifdef blocks, but right now I honestly can't
see how else to handle this stuff in a way that can ensure that all the
strange edge cases are handled properly.  And they MUST be handled
properly, or system stability can be at risk.  Ideas?

Jon

---
'Cloning and the reprogramming of DNA is the first serious step in 
becoming one with God.'
        - Scientist G. Richard Seed

Reply via email to