Am Mittwoch, 29. Januar 2003 00:05 schrieb Ian Romanick:
> Felix Kühling wrote:
> > On Tue, 28 Jan 2003 13:10:41 -0800
> >
> > Ian Romanick <[EMAIL PROTECTED]> wrote:
> >>Felix Kühling wrote:
> >>>The patch moves the load operations back to the front of the loop as in
> >>>the G3TN_norm_w_lengths case.
> >>
> >>Good catch.  It looks like this went into the Mesa tree back in October
> >>of 2001...over a year ago!  It looks like Andres Lewycky gave Brian some
> >>bad patches. :(
> >
> > Yeah, but until November 2002 (DRI trunk) there was a comment in 3dnow.c
> > that the 3dnow-normal code is broken and it was not used.
>
> D'oh!

;-)

> >>I realize that AMD recommends reading memory backwards, but would a
> >>quick-fix be to just use the 3Dnow! prefetch instructions?

"Block Prefetch", page 18, see below.

> > The prefetch instructions used are and must be 3DNow instructions. On
> > Intel Prefetch was introduced with the SSE extension on the PentiumIII.
> > They're not available on older Athlons and K6's.

It all depends on steppings...

Some output from MPlayer, best optimized OSS app I know:

CPU: Advanced Micro Devices Athlon 4 PM Palomino/Athlon MP 
Multiprocessor/Athlon XP eXtreme Performance (Family: 6, Stepping: 2)
Detected cache-line size is 64 bytes
CPUflags:  MMX: 1 MMX2: 1 3DNow: 1 3DNow2: 1 SSE: 1 SSE2: 0
Kompiliert für x86 CPU mit folgenden Erweiterungen: MMX MMX2 3DNow 3DNowEx SSE

> > Anyway, all that
> > prefetching looks odd to me. In the first transform loop in
> > _mesa_3dnow_transform_normalize_normals memory is prefetched which is
> > never read but only written. This is obviously useless. Then in the
> > normalize loop the memory which was written before is prefetched again.
> > I think this is not necessary. The array is small enough to be still in
> > the cache.
>
> I believe that prefetchw tells the processor to warm up the cache line
> because it's going to be written soon.  I think the prefetching in the
> first loop is probably correct.  The prefetchw of (%eax) might need to
> be before the add.  I'd have to benchmark it.  I'm not sure if I have a
> 3dnow capable box around anymore.  If I do, it will be an old K6-III. :)
>
> > I'll see if I can clean this up a bit. On the mesa-4-0-4 branch this
> > code is disabled anyway, so there is not really a hurry to apply my
> > stupid little patch. About this reading backward thing, where is that
> > documented. I have an AMD Athlon optimization guide from February 2002
> > which doesn't mention it.
>
> I've seen a reference posted to dri-devel a couple times.

All from me;-)

> Here's a couple references the Dieter posted on 09-Jan-2003:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=103548024914815&w=2
> http://208.15.46.63/events/gdc2002.htm

And here are some numbers:

nuetzel/Entwicklung> ./athlon-DN
1600.081 MHz
clear_page by 'normal_clear_page'        took 12757 cycles (489.9 MB/s)
clear_page by 'slow_zero_page'           took 12478 cycles (500.9 MB/s)
clear_page by 'fast_clear_page'          took 9684 cycles (645.4 MB/s)
clear_page by 'faster_clear_page'        took 4257 cycles (1468.0 MB/s)

copy_page by 'normal_copy_page'  took 9063 cycles (689.6 MB/s)
copy_page by 'slow_copy_page'    took 9051 cycles (690.5 MB/s)
copy_page by 'fast_copy_page'    took 8125 cycles (769.3 MB/s)
copy_page by 'faster_copy'       took 5468 cycles (1143.0 MB/s)
copy_page by 'even_faster'       took 5538 cycles (1128.5 MB/s)
copy_page by 'no_prefetch'       took 4462 cycles (1400.7 MB/s)

> I'm not sure if this applies to the K6 family or just to Athlons.  I
> suspect it may only apply to Athlons, but we may have to test it.

According to AMD (see the gdc2002.htm Presentation) it applies to _all_ modern 
x86 CPU's out there.

> >>Since these functions are globally exported, it might be worth it to
> >>write a quick test that calls the various _transform_normalize_normals
> >>functions to make sure that they all produces the same (or close enough)
> >>results.
> >
> > And:
> > _transform_normalize_normals_no_rot
> > _transform_rescale_normals_no_rot
> > _transform_rescale_normals
> > _transform_normals_no_rot
> > _transform_normals
> > _normalize_normals
> > _rescale_normals
> >
> > These should be tested too, while we're at it.

Yes.

-Dieter


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to