Neil Graham wrote: > On Tue, 2008-12-30 at 20:41 -0700, Jordan Crouse wrote: >>> I'm curious as to why reads from video memory are so slow, On standard >>> video cards it's slow because there is quite a division between the CPU >>> and the video memory, but on the geode isn't the video memory shared in >>> the same SDRAM as Main memory. >> It is, in that they share the same physical RAM chips, but they are >> controlled by different entities - one is managed by the system memory >> controller and the other is handled by the GPU. At start up time, the >> memory is carved up by the firmware, and after the top of system RAM is >> established, video and system memory behave for all intents and purposes >> like separate components. Put simply, there is no way to directly >> address video memory from the system memory. Access to the video memory >> has to happen via PCI cycles, and for obvious reasons the active video >> region has the cache disabled, accounting for relatively slow readback. > > That makes my brain melt, you can't address it even though it's on the > same chip!?! Even as far back as the PCjr the deal was that sharing > video memory cost some performance due to taking turns with cycles but > it gave some back with easy access to the memory for all. Has the > geode cunningly managed to provide a system that combines all the > disadvantages of separate memory with all the disadvantages of shared? > > One wonders what would happen if you wired some lines to the chips so > that the memory appeared in two places, would you get access to the ram > (with the usual 'you pays your money, you takes your chances' caveats > about coherency) > > I'm not a hardware person, but that all just seems odd.
You are missing the point - this model wasn't designed so that the system could somehow sneakily address video memory, it was designed so that the system designer could eliminate the need for the added cost, expense and real estate for a separate bank of memory chips. See also http://en.wikipedia.org/wiki/Shared_Memory_Architecture. >> That said, the read from memory performance is still worse then you >> might expect - I never really got a good answer from >> the silicon guys as to why. >> > being hit with the full sdram latency every access maybe? > > Is it feasible to try with caches enabled and require the software to > flush as needed. Ask around - I don't think that you'll find anybody too keen on having the X server execute a cache invalidate a half dozen times a second. Anyway, you are getting distracted and solving the wrong problem. You should be more concerned about limiting the number of times that the X server reads from video memory rather then worrying about how fast the read is. If I can rant for a second (and this isn't targeted at Neil specifically, but just in general), but this is another in a list of more or less hard constraints that the current XO design has. Throughout the history of the project, it seems to me that developers have been more biased toward trying to eliminate those constraints rather then making the software work in spite of them. The processor is too slow - everybody immediately wants to overclock. There is too little memory - enter a few dozen schemes for compressing it or swaping it. The XO platform has limitations, most of which were introduced by choice for power or cost reasons. The limitations are clearly documented and were known by all, at least when the project started. The understanding was that the software would have to be adjusted to fit the hardware, not the other way around. Over time, we seem to have lost that understanding. Software engineering is hard - software engineering for resource restrained systems is even harder. In this day and age geeks like us have been accustomed to always having the latest and greatest hardware at our fingertips, and so the software that we write is also for the latest and greatest. And so, when confronted with a system such as the XO, our first instinct is to just plop our software on it and watch it go. That attitude is further re-enforced by the fact that the Geode is x86 based - just like our desktops. It should just work, right? We know better - or at least, we should know better. The solution to the performance problems is good old fashioned elbow grease. We have to take our software that is naturally biased toward the year 2007 and make it work for the year 1995. Thats going to involve fixing bugs in the drivers, but also re-thinking how the software works - and finding situations where the software might be inadvertently doing the wrong thing. Let me give you an example - as recently as X 1.5, operations involving an a8 alpha mask worked like this: * Draw a 1x1 rectangle in video memory containing the source color for the operation * Read the source color from video memory * Perform the mask operation with the source color This isn't smart for any kind of processor or GPU, running at 2 Ghz or half a Ghz. The X server knows the source color from the start, why don't we just use it? We get away with in it on a modern processor, but it kills us on the Geode. These are the sorts of things that we need to find and squash - and yes, it will be very time consuming and a little boring. But if you care about performance, I mean really care about it and not just out for the quick fix, these are the sorts of things that we need to do. Jordan _______________________________________________ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel