On Thu, 24 Jan 2002, Steven W. Orr wrote: > I'm not totally stoopid, but I really am not understanding this issue > very well.
You are not alone. There has been a fair amount of discussion on LKML (the Linux Kernel Mailing List) about the issue. *This issue is not nailed down yet*. It is entirely possible that further investigation will discover new information that invalidates previous conclusions. The story continues to unfold, as they say in the news business. > I'd pay money to go to a GNHLUG meeting where someone could explain this > to me with only a slightly restricted number of syllables per word. This is my understanding of the issue. I *know* it is not complete, and it may well be completely bogus. But for lack of anything better: The Athlon processor engages in something called "speculative writes". I am not quite sure how the speculation works, but the result is that data in the processor cache is written out to main memory "early". AGP has something called a GART (Graphics Address Remapping Table) that lets the video card access main memory in a direct fashion, to increase performance. Or something like that. The kernel is responsible for mapping main memory to virtual memory. It also is responsible for marking data as catchable by the various levels and layers of memory caching in the system. The kernel is marking data being written to the AGP card as catchable, and so the Athlon processor is caching them. However, the GART is not aware of this caching, and is doing something not quite compatible. The result is that data in the processor cache does not match data in some other location. When the Athlon does the speculative write to write the cache to main memory, everything goes to hell. Apparently, these speculative writes are sane and allowed by the Athlon design, and the GART behavior is allowed by the AGP design, and the problem is that the kernel is marking memory as catchable when it should not. I do not understand the details here, so if this seems like hand-waving, that is because it is. :-) As for why "mem=nopentium" and the memory page size would make a difference, well, the kernel folks aren't too sure of that either. It may be an accident having to do with page alignment boundaries, or it may just reduce (but not eliminate) the chance of the bug triggering, or who knows what. This stuff is heavy wizardry [1]. :-) (It also underscores the concern many have with the 2.4 kernel: If almost no one really understands the kernel's memory manager design and implementation, how can we be sure it works? [2]) Footnotes --------- [1] http://www.tuxedo.org/~esr/jargon/html/entry/heavy-wizardry.html [2] See LKML postings last month complaining about this. -- Ben Scott <[EMAIL PROTECTED]> | The opinions expressed in this message are those of the author and do not | | necessarily represent the views or policy of any other person, entity or | | organization. All information is provided without warranty of any kind. | ***************************************************************** To unsubscribe from this list, send mail to [EMAIL PROTECTED] with the text 'unsubscribe gnhlug' in the message body. *****************************************************************
