Ronald G Minnich <[EMAIL PROTECTED]> writes:

> I've been tracking a tough problem with the M754LMR mainboard. Symptom was
> that in intel_set_var_mtrr, the mtrr #200 (physbase 0) was getting loaded
> with 0x106, which is invalid, which resulted in a GPF.
> 
> Using the Arium ICE, I have tracked it to the following:
> 
>        // it is recommended that we disable and enable cache when we
>         // do this.
>         /* Disable cache */
>         /* Write back the cache and flush TLB */
>         asm volatile ("movl  %%cr0, %0\n\t"
>                       "orl  $0x40000000, %0\n\t"
>                       "wbinvd\n\t"
>                       "movl  %0, %%cr0\n\t"
>                       "wbinvd\n\t":"=r" (tmp)::"memory");
> 
> As soon as the first wbinvd instruction is executed, the stack variable
> (i.e. the memory location containing the variable)  for the physbase is
> corrupted; it had 0x6, and it is replaced with the old value and
> consequently has either 0x106 or 0x146.
> 
> Ok. So the cache has some junk inside, left over from before, I hope.  Is
> it possible our cache setup has gotten somewhat out of order? in other
> words, do we need a wbinvd earlier in the setup, or is there something
> wrong with this sequence? Are we not properly cleaning the cache out
> somehow before we enable it? I'm stumped.
> 
> Collins has not seen this on his box. I have seen it on every single
> m754lmr I've tried. One other person is also seeing this problem on one
> other machine, a VIA system. It dies with the same POST code, 0x60, which
> means it was trying to set MTRR #200 and failed.
> 
> I'm worried that our cache setup and invalidate may not be totally solid
> any more.

That could be, except for deleting a dummy case, the code really
hasn't changed.  Could we get a poll of which processors are involved?
This may be an intel cpu erratum that we need to load microde to fix.
I don't think we have tracked anything down to where we have needed to
do that before but it is possible.

There is also the interesting fact that the linux kernel is doing more
work, so our sequence may not be all that is required for trully
reliable operation but this code is in the linux kernel so in and of
itself it should not be evil.

Eric

Reply via email to