Re: [Xpert]Using MMX assembly (for video card drivers

2002-01-11 Thread Ewald Snel

 At 11:26 AM 4/01/02 +0100, Ewald Snel wrote:

(sorry for the duplicate message, it was delayed for one week (see date))

[...]

 It would be interesting to see if the same could be achieved with 3DNow!
 instructions, as this would provide a welcome boost for anyone with an AMD
 K6-2 or K6-3 or any of the other 3DNow! capable CPU's. I'm sure there are

Using MMX will benefit any CPU capable of MMX instructions, including AMD K6, 
K6-2, K6-3 and Athlon/Duron processors. That's why I did not use SSE or 
3DNow!.

 also a number of other platforms that could use in-line assembly to do the
 same (eg: PPC/Altivec).

 Out of interest, how much in-line assembly code are you referring to?
 Anywhere some of us can get a look-see?

Here's an image of what it looks like ...
http://rambo.its.tudelft.nl/~ewald/xfree86-chrominance-filter.jpg

And here are some patches ...
http://rambo.its.tudelft.nl/~ewald/

bye,

ewald
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-11 Thread Ewald Snel

[...]

 BTW - does anyone know why the mga driver internally converts to 422
 format ? It seems to me that mga 400 and 450 chips do support 420
 planar format... (I saw some sample code using it, I can probably find
 it back if needed). I think XFree would benefit from using this
 feature instead of converting to nonplanar 422.

I also wrote a patch for this several months ago (even before XFree86-4.1.0).
If you're interested, I've uploaded it here :

http://rambo.its.tudelft.nl/~ewald/XFree86-4.0.99.3-mga-xv-planar-data.patch

It's about 13% faster decoding DVD movies on a PII-350 using planar format 
instead of converting to YUY2. Unfortunately, the Matrox hardware is not 
capable of filtering the chrominance component in vertical direction, so you 
can't have that at the same time.

 Cheers,

bye,

ewald
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-10 Thread Stuart Young

At 11:26 AM 4/01/02 +0100, Ewald Snel wrote:
Hi,

Could I use MMX assembly for improving the mga video driver? I wrote a
vertical chrominance filter (*) for the XVideo module using inline MMX
assembly. This allows me to improve output quality without any speed penalty.

It would be interesting to see if the same could be achieved with 3DNow! 
instructions, as this would provide a welcome boost for anyone with an AMD 
K6-2 or K6-3 or any of the other 3DNow! capable CPU's. I'm sure there are 
also a number of other platforms that could use in-line assembly to do the 
same (eg: PPC/Altivec).

Out of interest, how much in-line assembly code are you referring to? 
Anywhere some of us can get a look-see?


Stuart Young - [EMAIL PROTECTED]
(aka Cefiar) - [EMAIL PROTECTED]

[All opinions expressed in the above message are my]
[own and not necessarily the views of my employer..]

___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Billy Biggs

Ewald Snel ([EMAIL PROTECTED]):

 Of course, I'm using #ifdef USE_MMX_ASM and the original C code as
 an alternative for other CPU architectures. Runtime detection of MMX
 support is not included yet, but will be added if MMX is allowed.

  I've also been playing with some mmx-ification of the XVideo routines,
for example I also did an SSE-4:2:0-to-4:2:2 function.

  There was some discussion on #xfree86 about efforts to have a nice
runtime detection mechanism somewhere.  Has anyone got any code for this
already done?  If not I might also have a go at it.

-- 
Billy Biggs
[EMAIL PROTECTED]
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread greg wright

  I've also been playing with some mmx-ification of the XVideo routines,
  for example I also did an SSE-4:2:0-to-4:2:2 function.

I just did this too, MMX only though. How many cycles/pixel did you
end up with? What percentage of pairing did you achieve?

   There was some discussion on #xfree86 about efforts to have a nice
 runtime detection mechanism somewhere.  Has anyone got any code for this
 already done?  If not I might also have a go at it.


there are plenty of samples of this on Intel's site.

--greg

 


___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Erik Walthinsen

On Fri, 4 Jan 2002, greg wright wrote:

 I just did this too, MMX only though. How many cycles/pixel did you
 end up with? What percentage of pairing did you achieve?
Note that only P5-core chips care about pairing, per-se.  There are much
nastier issues involved in modern P6 cores.  I haven't thought about them
for quite a while, so it'd take me a while to dig out the stuff and put it
back into main memory, but I think I have a pretty good understanding of
how the P6 really works...

 there are plenty of samples of this on Intel's site.
Unfortunately that just isn't very useful outside Intel's world.  There
are about a half-dozen manufacturers of x86 chips that matter, and they
all have all sortsof bizarre quirks.  I ran across a sourceforge project a
few days ago (x86info I think) that tries to deal with that, but I didn't
look at the code.

There's a larger issue when it comes to other architectures.  There are
similar but in some cases nastier problems on things like PPC and Alpha.
This is why I want to gather all this into a single library.  It would go
closely with my other projects, SpeciaLib and libcodec, which focus on
run-time specialization of time-critical kernels, such as the
motion-compensation code in an MPEG decoder, or color-space
conversion/transliterations, etc. (as in the 4:2:0 to 4:2:2 problem).

You can see a lot of this stuff at http://codecs.org/, though specialib
itself isn't there because it's not anywhere near formed enough for CVS.

  Erik Walthinsen [EMAIL PROTECTED] - System Administrator
__
   /  \GStreamer - The only way to stream!
  || M E G A* http://gstreamer.net/ *
  _\  /_



___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Billy Biggs

  I've also been playing with some mmx-ification of the XVideo
  routines, for example I also did an SSE-4:2:0-to-4:2:2 function.
 
 I just did this too, MMX only though. How many cycles/pixel did you
 end up with? What percentage of pairing did you achieve?

  I'll get some numbers in a sec.

  There was some discussion on #xfree86 about efforts to have a nice
  runtime detection mechanism somewhere.  Has anyone got any code for
  this already done?  If not I might also have a go at it.
 
 there are plenty of samples of this on Intel's site.

 And in many nice abstracted open source modules.  :)  Specifically I
meant code to put this somewhere appropriate in the X tree.

-- 
Billy Biggs
[EMAIL PROTECTED]
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Ewald Snel

Hi,

  I wrote a vertical chrominance filter (*) for the XVideo module using
  inline MMX assembly. This allows me to improve output quality without
  any speed penalty.

   Do you mean for upsampling to 4:2:2 ?  How do you filter?  Do you
 average to create the new chroma line?

Something like that, the filter uses 0.75x nearest chrominance sample and 
0.25x second nearest chrominance sample. This is more accurate as it doesn't 
shift the chrominance signal by 1 pixel.

Here are the patches, the second one is for enabling the horizontal filtering 
in hardware:

http://rambo.its.tudelft.nl/~ewald/XFree86-4.1.99.4-mga-xv-mmx-chromafilter.patch
http://rambo.its.tudelft.nl/~ewald/XFree86-4.2.0-mga-xv-uvfilter.patch

These are not paired for Pentium MMX, but performance is already better than 
the C version (which compiles to slow movzx instructions). It's nearly 
optimal for AMD Athlon though (about 2 IPC using L1-cache).

bye,

ewald
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Erik Walthinsen

On Fri, 4 Jan 2002, Billy Biggs wrote:

   Please, please correct me if I'm wrong here.  In MPEG sampling, the
 chrominance sample is halfway between the two luminance samples on the
 same vertical scanline (by is138182):

o   o  where   o == luma sample
x  x == chroma sample
o   o
Note that this depends on which version of MPEG you're talking about.  I
forget which (I can look it up if anyone's interested), but one of the
MPEG standards specifies that the chroma samples are located between the
lumas in both dimensions, i.e.:

o   o
  x
o   o

   So, are not the chroma samples above and below the same distance away?
 I thought this was the purpose of MPEG sampling, that is, it's
 reasonable to convert to 4:2:2 sampling by doubling the scanlines.
Possibly, but you have to beware what the chroma position is for the 4:2:2
as well.  If the 4:2:2 specifies colocated first luma and chroma, it will
work nicely for the first form (above).  If in the middle, it'll work for
the second form.

   What do you mean by shifting the chroma by one pixel?
If a chroma sample is colocated with a luma sample (in either dimension),
you get the following:

ooooo
 x x
|^|^|

Where a single chroma sample impacts three adjacent pixels (note the
difference between pixel and sample...), and the luma samples in the
middle actually get chroma from two different chroma samples.  In this
case you have to give differing amounts to each new (resampled) sample,
according to the percentages mentioned previously.

  Erik Walthinsen [EMAIL PROTECTED] - System Administrator
__
   /  \GStreamer - The only way to stream!
  || M E G A* http://gstreamer.net/ *
  _\  /_

___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Ewald Snel

Hi,

[...]

  Something like that, the filter uses 0.75x nearest chrominance sample
  and 0.25x second nearest chrominance sample. This is more accurate as
  it doesn't shift the chrominance signal by 1 pixel.

   Please, please correct me if I'm wrong here.  In MPEG sampling, the
 chrominance sample is halfway between the two luminance samples on the
 same vertical scanline (by is138182):

I think you're right, my interpolation looks like this :

o   o   (c=.75*c1 + .25*c0)
 c1
o   o   (c=.75*c1 + .25*c2)

o   o   (c=.75*c2 + .25*c1)
 c2
o   o   (c=.75*c2 + .25*c3)

[...]

   So, are not the chroma samples above and below the same distance away?
 I thought this was the purpose of MPEG sampling, that is, it's
 reasonable to convert to 4:2:2 sampling by doubling the scanlines.

It's reasonable, but doubling the scanlines will make the image look a little 
blocky as both scanlines use the same chrominance values. That's why you 
should use filtering.

   Are you sure that maybe the images where you see that nasty chroma
 artifact aren't from when the DVD is using interlaced encoding?  In this
 case, each second chroma sample is from a different field, and you can
 get blocky errors because you don't correllate samples correctly.

The source was a non-interlaced MPEG-1 video file. The red blocks are very 
small for (high resolution) DVD movies, but they are still visible.

   What do you mean by shifting the chroma by one pixel?

It's actually 0.5 pixel (my mistake :)) using the following filter :

o   o   (c=c1)
 c1
o   o   (c=.5*c1 + .5*c2)

o   o   (c=c2)
 c2
o   o   (c=.5*c2 + .5*c3)

bye,

ewald
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert



Re: [Xpert]Using MMX assembly (for video card drivers)

2002-01-04 Thread Billy Biggs

  To reply to my own mail  :)

Billy Biggs ([EMAIL PROTECTED]):

  It's actually 0.5 pixel (my mistake :)) using the following filter :
  
  o   o   (c=c1)
   c1
  o   o   (c=.5*c1 + .5*c2)
  
  o   o   (c=c2)
   c2
  o   o   (c=.5*c2 + .5*c3)
 
   I don't think this is right for MPEG2.

  I sent this and realized I might look like an asshole.  :)  This
should read:

  Thanks, I see what you mean now, and yeah, I think this filter is
wrong for filtering chroma from MPEG2.  :)

  Apologies.

-- 
Billy Biggs
[EMAIL PROTECTED]
___
Xpert mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/xpert