On Tue, 2 Apr 2002, Raystonn wrote: > > That is far from the truth - they have internal pipelining > > and parallelism. Their use of silicon can be optimised to balance > > the performance of just one single algorithm. You can never do that > > for a machine that also has to run an OS, word process and run > > spreadsheets. > > Modern processors have internal pipelining and parallelism as well.
Yes - and yet they still have horrible problems every time you have a conditional branch instruction. That's because they are trying to convert a highly linear operation (code execution) into some kind of a parallel form. Graphics is easier though. Each pixel and each polygon can be treated as a stand-alone entity and can be processed in true parallelism. > Most of > the processing power of today's CPUs go completely unused. It is possible > to create optimized implementations using Single-Instruction-Multiple-Data > (SIMD) instructions of efficient algorithms. Which is a way of saying "Yes, you could do fast graphics on the CPU if you put the GPU circuitry onto the CPU chip and pretend that it's now part of the core CPU". I'll grant you *that* - but it's not the same thing as doing the graphics in software. > > Since 1989, CPU speed has grown by a factor of 70. Over the same > > period the memory bus has increased by a factor of maybe 6 or so. > > We have gone from approximately 200MB/s of memory bandwidth (PC66 EDO RAM) > to over 3.2GB/s (dual 16-bit RDRAM channels) in the last 5 years. We have > over 16 times the memory bandwidth available today than we did just 5 years > ago. Available memory bandwidth has been growing more quickly than > processor clockspeed lately, and I do not foresee an end to this any time > soon. OK - so a factor 70 in CPU growth and a factor of 16 in RAM speed. My argument remains - and remember that whenever RAM gets faster, so do the graphics cards. You can run faster - but you can't catch up if the other guy is also running faster. > > On the other hand, the graphics card can use heavily pipelined > > operations to guarantee that the memory bandwidth is 100% utilised > > Overutilised in my opinion. The amount of overdraw performed by today's > video cards on modern games and applications is incredible. Immediate mode > rendering is an inefficient algorithm. Video cards tend to have extremely > well optimized implementations of this inefficient algorithm. That's because games *NEED* to do lots of overdraw. They are actually pretty smart about eliminating the 'obvious' cases by doing things like portal culling. Most of the overdraw comes from needing to do multipass rendering (IIRC, the new Return To Castle Wolfenstien game uses up to 12 passes to render some polygons). The overdraw due to that kind of thing is rather harder to eliminate with algorithmic sophistication. If you need that kind of surface quality, your bandwidth out of memory will be high no matter what. > Kyro-based video cards perform quite well. They are not quite up to the > level of nVidia's latest cards... Not *quite*!!! Their best card is significantly slower than a GeForce 2MX - that's four generations of nVidia technology ago. I agree that if this algorithm were to be implemented on a card with the *other* capabilities of an nVidia card - then it would improve the fill rate by perhaps a factor of two or four. (Before you argue about that - realise that I've designed *and* built hardware and software using this technology - and I've MEASURED it's performance for 'typical' scenes). But you can only draw scenes where the number of polygons being rendered can fit into the 'scene capture' buffer. And that's the problem with that technology. If I want to draw a scene with a couple of million polygons in it (perfectly possible with modern cards) then those couple of million polygons have to be STORED ON THE GRAPHICS CARD. That's a big problem for an affordable graphics card. Adding another 128Mb of fast RAM to store the scene in costs a lot more than doubling the amount of processing power on the GPU. The amount of RAM on the chip becomes a major cost driver for a $120 card. None of those issues affect a software solution though - and it's possible that a scene capture solution *could* be better than a conventional immediate mode renderer - but I still think that it will at MOST only buy you a factor of 2x or 4x pixel rate speedup and you have a MUCH larger gap than that to hurdle. Also, in order to use scene capture, you are reliant on the underlying graphics API to be supportive of this technique. Neither OpenGL nor Direct3D are terribly helpful. You can write things like: Render 100 polygons. Read back the image they created. if the pixel at (123,456) is purple then { put that image into texture memory. Render another 100 polygons using the texture you just created. } ...scene capture algorithms have a very hard time with things like that because you can only read back the image *after* it's been rendered - but if you have to capture the entire scene in order to render it... I'm not saying that OpenGL and Direct3D are what you'd ideally want to use for this kind of technique - but it'll take a lot to get another new API accepted. > > Everything that is speeding up the main CPU is also speeding up > > the graphics processor - faster silicon, faster busses and faster > > RAM all help the graphics just as much as they help the CPU. > > Everything starts out in hardware and eventually moves to software. That's odd - I see the reverse happening. First we had software 'rendering' the entire image directly to the DAC (Remember the Sinclair ZX-80?)...then we had graphics memory as a part of the CPU address space (TRS-80, Pet, Apple ][) with hardware added to clock it out to the DAC. Then we had hardware with it's own RAM (PC's MGA, CGA), then we added hardware to relieve the CPU of the blitting tasks and other simple graphics functions (VGA), then we added polygon fill (Voodoo, TNT, etc), then hardware Transform & Lighting (ATI Radion, GeForce-256), and then things like skin and bones multi-matrix stuff (GeForce-2) and now programmable graphics operations (GeForce-3). I'm seeing things migrating from software *into* hardware. I can't think of a single graphics operation that's gone the other way. > There > will come a time when the basic functionality provided by video cards can be > easily done by a main processor. Well, in a sense. A modern CPU can probably render pixels faster than a Voodoo-1. So games that *used* to only run in hardware *could* now be run in software - but modern games *NEED* all the performance of a modern graphics card - and the CPU won't come close to meeting it. As CPU's get faster, graphics cards get *MUCH* faster. Remember this: >> If you doubt this, look at the progress over the last 5 or 6 >> years. In late 1996 the Voodoo-1 had a 50Mpixel/sec fill rate. >> In 2002 GeForce-4 has a fill rate of 4.8 Billion (antialiased) >> pixels/sec - it's 100 times faster. >> Over the same period, your 1996 vintage 233MHz CPU has scaled >> up to a 2GHz machine ...a mere 10x speedup. CPU's aren't "catching up" - they are getting left behind. > The extra features offered by the video > cards, such as pixel shaders, are simply attempts to stand-in as a main > processor. They are adding in steps to the graphics processing that are programmable. That's further reducing the need to go back to the CPU where additional flexibility is needed. This trend isn't an indication that we need the CPU *more* - it shows that we don't need it as much because where flexibility was lacking in the rendering process, we are putting it into the graphics hardware. This is the entire thrust of the OpenGL 2.0 initiative. > Intel is capable of > pushing microprocessor technology more quickly than nVidia or ATI, > regardless of how much nVidia wants their technology to be at the center of > the chipset. So how come Intel CPU's have only doubled in speed over the last 18 months when nVidia's GPU's have speeded up by a factor of four or so in the same interval? > > However, increasing the number of transistors you can have on > > a chip doesn't help the CPU out very much. Their instruction > > sets are not getting more complex in proportion to the increase > > in silicon area - and their ability to make use of more complex > > What would you call MMX, SSE, SSE2, and even 3dnow? These are additional > instructions designed to optimize the use of these new transistors. Yes - but they of *minute* benefit because compilers can't make much use of them. Also, they are a small increment in functionality that's happened over a period of something like 6 years...in that time, graphics cards have been totally revolutionised by adding multitexture, shader languages, transform and lighting...etc. In all that time, all Intel have added are a couple of operations that work on four bytes in parallel and a few low precision math operations - where graphics cards have absorbed *ALL* of the OpenGL/D3D API's! > > instructions is already limited by the brain power of compiler > > writers. > > Since when can you write a pixel shading routine in a standard C/C++ > compiler? You can't - but you *can* use high level shading languages that are better suited to describing surface properties than C/C++. We are close to having the Renderman shader implementable in hardware - there are a couple of other shader language compilers that generate code for ATI Radion and nVidia GeForce cards - there is the SGI shader compiler and of course the OpenGL 2.0 initiative. If you go look at the source code for some of the later versions of Quake, you'll see that the authors of that program wrote a shader language into it. > > If you doubt this, look at the progress over the last 5 or 6 > > years. In late 1996 the Voodoo-1 had a 50Mpixel/sec fill rate. > > In 2002 GeForce-4 has a fill rate of 4.8 Billion (antialiased) > > pixels/sec - it's 100 times faster. > > Fill rate is just memory bandwidth. It is not hard to offer more memory > channels. In fact, a dual-channel DDR chipset is coming soon for the > Pentium 4. In May the Pentium 4 will have access to 4.3GB/s of memory > bandwidth. Future generations will offer considerably more. But all of those benefits are also available to graphics chips - you have to get a 100-fold speedup from *somewhere* - RAM bandwidth *could* possibly get you that - but then graphics cards will also have a 100x RAM bandwidth speedup - so the relative performances will remain. ---- Steve Baker (817)619-2657 (Vox/Vox-Mail) L3Com/Link Simulation & Training (817)619-2466 (Fax) Work: [EMAIL PROTECTED] http://www.link.com Home: [EMAIL PROTECTED] http://www.sjbaker.org _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel