Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

Raystonn Wed, 03 Apr 2002 04:31:46 -0800

> >  The only thing video
> > cards have today that is really better than the main processor is
massive
> > amounts of memory bandwidth.
>
> That is far from the truth - they have internal pipelining
> and parallelism.  Their use of silicon can be optimised to balance
> the performance of just one single algorithm.  You can never do that
> for a machine that also has to run an OS, word process and run
> spreadsheets.


Modern processors have internal pipelining and parallelism as well.  Most of
the processing power of today's CPUs go completely unused.  It is possible
to create optimized implementations using Single-Instruction-Multiple-Data
(SIMD) instructions of efficient algorithms.


> >  Since memory bandwidth is increasing rapidly,...
>
> It is?!?  Let's look at the facts:
>
> Since 1989, CPU speed has grown by a factor of 70.  Over the same
> period the memory bus has increased by a factor of maybe 6 or so.

We have gone from approximately 200MB/s of memory bandwidth (PC66 EDO RAM)
to over 3.2GB/s (dual 16-bit RDRAM channels) in the last 5 years.  We have
over 16 times the memory bandwidth available today than we did just 5 years
ago.  Available memory bandwidth has been growing more quickly than
processor clockspeed lately, and I do not foresee an end to this any time
soon.


> On the other hand, the graphics card can use heavily pipelined
> operations to guarantee that the memory bandwidth is 100% utilised

Overutilised in my opinion.  The amount of overdraw performed by today's
video cards on modern games and applications is incredible.  Immediate mode
rendering is an inefficient algorithm.  Video cards tend to have extremely
well optimized implementations of this inefficient algorithm.


> - and can use an arbitarily large amount of parallelism to improve
> throughput.  The main CPU can't do that because it's memory access
> patterns are not regular and it has little idea where the next byte
> has to be read from until it's too late.

Modern processors have a considerable amount of parallelism built in.  With
advanced prefetch and streaming SIMD instructions it is very possible to do
these types of operations in a modern processor.  It will, however, take
another couple of years to be able to render at great framerates and high
resolutions.


> You only have to look at the gap you are trying to bridge - a
> modern graphics card is *easily* 100 times faster at rendering
> sophisticated pixels (with pixel shaders, multiple textures and
> antialiasing) than the CPU.

They are limited in what they can do.  In order to allow more flexibility
they have recently introduced pixel shaders, which basically turns the video
card into a mini-CPU.  Modern processors can perform these features more
quickly and would allow an order of magnitude more flexibility in what can
be done.


> > A properly
> > implemented and optimized software version of a tile-based
"scene-capture"
> > renderer much like that used in Kyro could perform as well as the latest
> > video cards in a year or two.  This is what I am dabbling with at the
> > moment.
>
> I await this with interest - but 'scene capture' systems tend to be
> unusable with modern graphics API's...they can't run either OpenGL
> or Direct3D efficiently for arbitary input.  If there were to be
> some change in consumer needs that would result in 'scene capture'
> being a usable technique - then the graphics cards can easily take
> that on board and will *STILL* beat the heck out of doing it in
> the CPU.  Scene capture is also only feasible if the number of
> polygons being rendered is small and bounded - the trends are
> for modern graphics software to generate VAST numbers of polygons
> on-the-fly precisely so they don't have to be stored in slow old
> memory.

Kyro-based video cards perform quite well.  They are not quite up to the
level of nVidia's latest cards but this is new technology and is being
worked on by a relatively new company.  These cards do not require nearly as
much memory bandwidth as immediate-mode renderers, performing 0 overdraw.
They are more processing intensive rather than being bandwidth intensive.  I
see this as a more efficient algorithm.


> Everything that is speeding up the main CPU is also speeding up
> the graphics processor - faster silicon, faster busses and faster
> RAM all help the graphics just as much as they help the CPU.

Everything starts out in hardware and eventually moves to software.  There
will come a time when the basic functionality provided by video cards can be
easily done by a main processor.  The extra features offered by the video
cards, such as pixel shaders, are simply attempts to stand-in as a main
processor.  Once the basic functionality of the video card can be performed
by the main system procsesor, there will really be no need for extra
hardware to perform these tasks.  What I see now is a move by the video card
companies to software-based solutions (pixel shaders, etc.)  They have
recognized that there are limitations to what specialized hardware can do
and they are now attempting to allow programmers more flexibility.  However,
this is the kind of functionality where the main system processor has a huge
advantage.  If more features are added in this manner (as software) then the
specialized video card hardware will lose its edge.  Intel is capable of
pushing microprocessor technology more quickly than nVidia or ATI,
regardless of how much nVidia wants their technology to be at the center of
the chipset.


> However, increasing the number of transistors you can have on
> a chip doesn't help the CPU out very much.  Their instruction
> sets are not getting more complex in proportion to the increase
> in silicon area - and their ability to make use of more complex

What would you call MMX, SSE, SSE2, and even 3dnow?  These are additional
instructions designed to optimize the use of these new transistors.


> instructions is already limited by the brain power of compiler
> writers.

Since when can you write a pixel shading routine in a standard C/C++
compiler?  Assembly language can be used for the main processor just as
easily as it can be used for pixel shaders using nVidia's own assembly
language.  In fact, there is a great deal more support for assembly language
on the main processor.


> Most of the speedup in modern CPU's is coming from
> physically shorter distances for signals to travel and faster
> clocks - all of the extra gates typically end up increasing the
> size of the on-chip cache which has marginal benefits to graphics
> algorithms.
>
> In contrast to that, a graphics chip designer can just double
> the number of pixel processors or something and get an almost
> linear increase in performance with chip area with relatively
> little design effort and no software changes.

Modern processors have multiple parallel units for both integer and FPU
operations.  More of these are added as time marches on.  These have proven
to offer a huge performance boost.  Increasing processor performance is much
more complex than a simple die shrink.


> If you doubt this, look at the progress over the last 5 or 6
> years.  In late 1996 the Voodoo-1 had a 50Mpixel/sec fill rate.
> In 2002 GeForce-4 has a fill rate of 4.8 Billion (antialiased)
> pixels/sec - it's 100 times faster.

Fill rate is just memory bandwidth.  It is not hard to offer more memory
channels.  In fact, a dual-channel DDR chipset is coming soon for the
Pentium 4.  In May the Pentium 4 will have access to 4.3GB/s of memory
bandwidth.  Future generations will offer considerably more.


> The graphics cards are also gaining features.
> Over that same period, they added - windowing, hardware T&L,
> antialiasing, multitexture, programmability, you name it.
> Meanwhile the CPU's have added just a modest amount of MMX/3Dnow
> type functionality...almost none of which is actually *used*
> because our compilers don't know how to generate those new
> instructions in compiling generalised C/C++ code.

The Intel C/C++ compiler generates MMX, SSE, and SSE2 instructions if you
tell it to do so.  It requires no inline assembly, though inline assembly is
always a good idea.  SSE and SSE2 are used in nVidia's drivers...


> CONCLUSION.
> ~~~~~~~~~~~
> There is no sign whatever that CPU's are "catching up" with
> graphics cards - and no logical reason why they ever will.


I will have to disagree here.  Indications are that the video card
manufacturers are looking more and more into 'programmable' features such as
the pixel shaders.  If this is the case it would be relatively easy for the
main processor to 'catch up'.  Programmability is its specialty.

At any rate, we will probably just have to agree to disagree here. ;)

-Raystonn


_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

Reply via email to