Marc Jones wrote: > On Fri, Nov 27, 2009 at 2:05 AM, Nathan Williams <[email protected]> > wrote: >> Nathan Williams wrote: >>> Marc Jones wrote: >>>> On Tue, Nov 24, 2009 at 1:09 AM, Nathan Williams <[email protected]> >>>> wrote: >>>>> Marc Jones wrote: >>>>>> On Mon, Nov 23, 2009 at 12:27 AM, Nathan Williams >>>>>> <[email protected]> wrote: >>>>>>> I managed to get the commercial BIOS to boot on my board and diffed it >>>>>>> with coreboot: >>>>>>> >>>>>>> http://coreboot.pastebin.com/m39b22c21 >>>>>>> >>>>>>> The only differences I can see are related to interrupts, which >>>>>>> shouldn't matter in relation to >>>>>>> my RAM problems. >>>>>>> >>>>>>> I have also run a memtest86 with the commercial BIOS (from bootable >>>>>>> CDROM) and as a payload in coreboot. >>>>>>> The commercial BIOS didn't have any errors, but my coreboot did. So >>>>>>> the hardware can't be too bad. >>>>>> That looks like just the southbridge cs5536 target. The memory >>>>>> differences would be in the processor geodelx target. Can you send >>>>>> those results? >>>>>> >>>>>> Marc >>>>>> >>>>> I did some new MSR dumps. >>>>> >>>>> Diff: >>>>> ./msrtool -t geodelx -t cs5536 -d amd_ref_bios >>>>> http://coreboot.pastebin.com/m5e487f87 >>>>> >>>>> AMD NAS reference BIOS: >>>>> ./msrtool -t geodelx -t cs5536 -l -s amd_ref_bios >>>>> http://coreboot.pastebin.com/madc04ac >>>>> >>>>> My Coreboot: >>>>> ./msrtool -t geodelx -t cs5536 -l -s nathan_bios >>>>> http://coreboot.pastebin.com/m7f35d855 >>>>> >>>>> >>>>> The diffs I did today show some differences with GLCP_DELAY_CONTROLS. >>>>> Last time I added some code to force it to match the commercial BIOS >>>>> GLCP_DELAY_CONTROLS MSR, but it didn't seem to make any difference. >>>>> >>>>> I also tested all the SODIMMS I have here (about 10) with the commercial >>>>> BIOS. >>>>> Each time I did a msrtool diff to one I saved on disk. >>>>> >>>>> Most are 333MHz, but 2 are 400MHz. There weren't any changes to the MSRs. >>>>> >>>>> Could there be an issue with the initialisation sequence that reading MSRs >>>>> after booting won't show? Also, quite a few MSRs aren't defined in >>>>> geodelx.c yet. >>>>> Are there any obvious ones that should be added in? >>>>> >>>> --- AMD NAS reference BIOS >>>> +++ Nathan's coreboot v3 >>>> # >>>> # GLCP_DELAY_CONTROLS >>>> # >>>> -0x4c00000f 0x83f1_00aa_5696_0404 >>>> +0x4c00000f 0x8271_005a_ 5696_ 0404 >>>> >>>> It looks like coreboot and the ref bios detect different dimm >>>> configuration. This timing setup could be part of the instability (I >>>> don't think it explains the reset problem). Look at the code here: >>>> SetDelayControl(void) and anywhere else that GLCP_DELAY_CONTROLS gets >>>> set to see what might be happening. Make sure that MTest is disabled >>>> in the ref bios setup. This setting is based on the number of devices >>>> (load) there is on the dimm. >>>> >>>> I didn't realize that so few registers were in the msr tool for >>>> geodelx. You should add these: >>>> 20000018h R/W Refresh and SDRAM Program (MC_CF07_DATA) >>>> 10071007_00000040h Page 227 >>>> 20000019h R/W Timing and Mode Program (MC_CF8F_DATA) 18000008_287337A3h >>>> Page 229 >>>> 2000001Ah R/W Feature Enables (MC_CF1017_DATA) 00000000_11080001h Page 231 >>>> 2000001Bh RO Performance Counters (MC_CFPERF_CNT1) 00000000_00000000h Page >>>> 232 >>>> 2000001Ch R/W Counter and CAS Control (MC_PERCNT2) 00000000_00FF00FFh Page >>>> 233 >>>> 2000001Dh R/W Clocking and Debug (MC_CFCLK_DBUG) 00000000_00001300h Page >>>> 233 >>>> >>>> 4C00000Fh R/W GLCP I/O Delay >>>> Controls(GLCP_DELAY_CONTROLS)00000000_00000000h Page 549 >>>> 4C000014h R/W GLCP System Reset and PLL Control (GLCP_SYS_RSTPLL) >>>> Bootstrap specific Page 554 >>>> >>>> Marc >>>> >>> I've now added the MSRs and uploaded to pastebin: >>> >>> AMD NAS: >>> http://coreboot.pastebin.com/m53aed60b >>> >>> My coreboot: >>> http://coreboot.pastebin.com/md23bc6a >>> >>> ./msrtool -d AMD_NAS: >>> http://coreboot.pastebin.com/m77663de5 >>> >>> Tomorrow I'll try the tests on the NAS hardware, instead of our own >>> motherboards >>> just in case there are some hidden hardware issues. >>> >>> Regards, >>> Nathan >>> >> On the NAS reference board I got the following diff between coreboot >> and the commercial BIOS: >> >> http://coreboot.pastebin.com/m1353db1a >> >> As you can see there are a lot of latency differences. >> Unfortunately it was only later that I realised that the differences are >> because the bootstraps are set to bypass, which means coreboot uses 266 as >> the speed, where as the commercial bios uses 333. So when I repeat the same >> on our boards, the only difference in the geodelx MSRs is: >> >> # MC_CFCLK_DBUG >> -0x2000001d 0x0000000000000000 >> +0x2000001d 0x0000000000001000 >> # 12 TRISTATE_DIS TRI-STATE Disable >> -0: Tri-stating enabled >> +1: Tri-stating disabled > > > Nathan, > > I don't think the tri-state disable bit explains the problems you have > seen. Since the memory has the same settings, the problem must be > somewhere else. You will need to go back the the reboot path to > investigate. It seems like something in the reset isn't doing a > complete reset, which causes a problem with the cache disable. > > Marc > >
I am suspicious that the reset problem only occurs when I'm using a laptop hard drive off the 44pin IDE connector on our board. I have tried booting with a 3.5" drive and external 12V, but I can't replicate the problem. With the 3.5" drive, a reboot from fsck works fine. Hopefully the next PCB revision should perform better because we've moved the 5V plane further away from the DDR tracks. I don't know if I mentioned another problem that has similar symptoms. Some RAM causes the same cache disable problem, even if there are no IDE devices connected. This happens from power-up, so it's not a reset issue. Nathan -- coreboot mailing list: [email protected] http://www.coreboot.org/mailman/listinfo/coreboot

