Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

Stephen J Baker Thu, 04 Apr 2002 13:21:07 -0800

On Tue, 2 Apr 2002, Raystonn wrote:

> > That is far from the truth - they have internal pipelining
> > and parallelism.  Their use of silicon can be optimised to balance
> > the performance of just one single algorithm.  You can never do that
> > for a machine that also has to run an OS, word process and run
> > spreadsheets.
>
> Modern processors have internal pipelining and parallelism as well.


Yes - and yet they still have horrible problems every time you have
a conditional branch instruction.  That's because they are trying
to convert a highly linear operation (code execution) into some
kind of a parallel form.  Graphics is easier though.  Each pixel and
each polygon can be treated as a stand-alone entity and can be
processed in true parallelism.

>  Most of
> the processing power of today's CPUs go completely unused.  It is possible
> to create optimized implementations using Single-Instruction-Multiple-Data
> (SIMD) instructions of efficient algorithms.

Which is a way of saying "Yes, you could do fast graphics on the CPU
if you put the GPU circuitry onto the CPU chip and pretend that it's
now part of the core CPU".

I'll grant you *that* - but it's not the same thing as doing the
graphics in software.

> > Since 1989, CPU speed has grown by a factor of 70.  Over the same
> > period the memory bus has increased by a factor of maybe 6 or so.
>
> We have gone from approximately 200MB/s of memory bandwidth (PC66 EDO RAM)
> to over 3.2GB/s (dual 16-bit RDRAM channels) in the last 5 years.  We have
> over 16 times the memory bandwidth available today than we did just 5 years
> ago.  Available memory bandwidth has been growing more quickly than
> processor clockspeed lately, and I do not foresee an end to this any time
> soon.

OK - so a factor 70 in CPU growth and a factor of 16 in RAM speed.
My argument remains - and remember that whenever RAM gets faster,
so do the graphics cards.  You can run faster - but you can't catch up
if the other guy is also running faster.

> > On the other hand, the graphics card can use heavily pipelined
> > operations to guarantee that the memory bandwidth is 100% utilised
>
> Overutilised in my opinion.  The amount of overdraw performed by today's
> video cards on modern games and applications is incredible.  Immediate mode
> rendering is an inefficient algorithm.  Video cards tend to have extremely
> well optimized implementations of this inefficient algorithm.

That's because games *NEED* to do lots of overdraw.  They are actually
pretty smart about eliminating the 'obvious' cases by doing things
like portal culling.  Most of the overdraw comes from needing to do
multipass rendering (IIRC, the new Return To Castle Wolfenstien game
uses up to 12 passes to render some polygons).  The overdraw due to
that kind of thing is rather harder to eliminate with algorithmic
sophistication.  If you need that kind of surface quality, your
bandwidth out of memory will be high no matter what.

> Kyro-based video cards perform quite well.  They are not quite up to the
> level of nVidia's latest cards...

Not *quite*!!! Their best card is significantly slower than
a GeForce 2MX - that's four generations of nVidia technology
ago.

I agree that if this algorithm were to be implemented on a card
with the *other* capabilities of an nVidia card - then it would improve
the fill rate by perhaps a factor of two or four. (Before you argue
about that - realise that I've designed *and* built hardware and software
using this technology - and I've MEASURED it's performance for 'typical'
scenes).

But you can only draw scenes where the number of polygons being rendered
can fit into the 'scene capture' buffer.  And that's the problem with
that technology.

If I want to draw a scene with a couple of million polygons in it (perfectly
possible with modern cards) then those couple of million polygons have
to be STORED ON THE GRAPHICS CARD.  That's a big problem for an affordable
graphics card.

Adding another 128Mb of fast RAM to store the scene in costs a lot more
than doubling the amount of processing power on the GPU.  The amount of
RAM on the chip becomes a major cost driver for a $120 card.

None of those issues affect a software solution though - and it's
possible that a scene capture solution *could* be better than a
conventional immediate mode renderer - but I still think that
it will at MOST only buy you a factor of 2x or 4x pixel rate speedup
and you have a MUCH larger gap than that to hurdle.

Also, in order to use scene capture, you are reliant on the underlying
graphics API to be supportive of this technique.  Neither OpenGL nor
Direct3D are terribly helpful.  You can write things like:

   Render 100 polygons.
   Read back the image they created.

   if the pixel at (123,456) is purple then
   {
     put that image into texture memory.

     Render another 100 polygons using the texture
     you just created.
   }

...scene capture algorithms have a very hard time with things like
that because you can only read back the image *after* it's been
rendered - but if you have to capture the entire scene in order to
render it...

I'm not saying that OpenGL and Direct3D are what you'd ideally
want to use for this kind of technique - but it'll take a lot to
get another new API accepted.

> > Everything that is speeding up the main CPU is also speeding up
> > the graphics processor - faster silicon, faster busses and faster
> > RAM all help the graphics just as much as they help the CPU.
>
> Everything starts out in hardware and eventually moves to software.

That's odd - I see the reverse happening.  First we had software
'rendering' the entire image directly to the DAC (Remember the
Sinclair ZX-80?)...then we had graphics memory as a part of
the CPU address space (TRS-80, Pet, Apple ][) with hardware added
to clock it out to the DAC.  Then we had hardware with it's own
RAM (PC's MGA, CGA), then we added hardware to relieve the CPU
of the blitting tasks and other simple graphics functions (VGA),
then we added polygon fill (Voodoo, TNT, etc), then hardware
Transform & Lighting (ATI Radion, GeForce-256), and then things
like skin and bones multi-matrix stuff (GeForce-2) and now
programmable graphics operations (GeForce-3).

I'm seeing things migrating from software *into* hardware.  I
can't think of a single graphics operation that's gone the other
way.

> There
> will come a time when the basic functionality provided by video cards can be
> easily done by a main processor.

Well, in a sense.  A modern CPU can probably render pixels faster than
a Voodoo-1.  So games that *used* to only run in hardware *could* now
be run in software - but modern games *NEED* all the performance of
a modern graphics card - and the CPU won't come close to meeting it.

As CPU's get faster, graphics cards get *MUCH* faster.

Remember this:

>> If you doubt this, look at the progress over the last 5 or 6
>> years.  In late 1996 the Voodoo-1 had a 50Mpixel/sec fill rate.
>> In 2002 GeForce-4 has a fill rate of 4.8 Billion (antialiased)
>> pixels/sec - it's 100 times faster.
>> Over the same period, your 1996 vintage 233MHz CPU has scaled
>> up to a 2GHz machine ...a mere 10x speedup.

CPU's aren't "catching up" - they are getting left behind.

>  The extra features offered by the video
> cards, such as pixel shaders, are simply attempts to stand-in as a main
> processor.

They are adding in steps to the graphics processing that are programmable.
That's further reducing the need to go back to the CPU where additional
flexibility is needed.  This trend isn't an indication that we need the
CPU *more* - it shows that we don't need it as much because where
flexibility was lacking in the rendering process, we are putting it
into the graphics hardware.

This is the entire thrust of the OpenGL 2.0 initiative.

> Intel is capable of
> pushing microprocessor technology more quickly than nVidia or ATI,
> regardless of how much nVidia wants their technology to be at the center of
> the chipset.

So how come Intel CPU's have only doubled in speed over the last 18
months when nVidia's GPU's have speeded up by a factor of four or so
in the same interval?

> > However, increasing the number of transistors you can have on
> > a chip doesn't help the CPU out very much.  Their instruction
> > sets are not getting more complex in proportion to the increase
> > in silicon area - and their ability to make use of more complex
>
> What would you call MMX, SSE, SSE2, and even 3dnow?  These are additional
> instructions designed to optimize the use of these new transistors.

Yes - but they of *minute* benefit because compilers can't make much use
of them.  Also, they are a small increment in functionality that's
happened over a period of something like 6 years...in that time,
graphics cards have been totally revolutionised by adding multitexture,
shader languages, transform and lighting...etc.   In all that time,
all Intel have added are a couple of operations that work on four
bytes in parallel and a few low precision math operations - where
graphics cards have absorbed *ALL* of the OpenGL/D3D API's!

> > instructions is already limited by the brain power of compiler
> > writers.
>
> Since when can you write a pixel shading routine in a standard C/C++
> compiler?

You can't - but you *can* use high level shading languages that are
better suited to describing surface properties than C/C++. We are
close to having the Renderman shader implementable in hardware - there
are a couple of other shader language compilers that generate
code for ATI Radion and nVidia GeForce cards - there is the SGI shader
compiler and of course the OpenGL 2.0 initiative.  If you go look at the
source code for some of the later versions of Quake, you'll see that
the authors of that program wrote a shader language into it.

> > If you doubt this, look at the progress over the last 5 or 6
> > years.  In late 1996 the Voodoo-1 had a 50Mpixel/sec fill rate.
> > In 2002 GeForce-4 has a fill rate of 4.8 Billion (antialiased)
> > pixels/sec - it's 100 times faster.
>
> Fill rate is just memory bandwidth.  It is not hard to offer more memory
> channels.  In fact, a dual-channel DDR chipset is coming soon for the
> Pentium 4.  In May the Pentium 4 will have access to 4.3GB/s of memory
> bandwidth.  Future generations will offer considerably more.

But all of those benefits are also available to graphics chips - you
have to get a 100-fold speedup from *somewhere* - RAM bandwidth *could*
possibly get you that - but then graphics cards will also have a 100x
RAM bandwidth speedup - so the relative performances will remain.

----
Steve Baker                      (817)619-2657 (Vox/Vox-Mail)
L3Com/Link Simulation & Training (817)619-2466 (Fax)
Work: [EMAIL PROTECTED]           http://www.link.com
Home: [EMAIL PROTECTED]       http://www.sjbaker.org


_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

Reply via email to