On Sun, Jul 21, 2013 at 06:41:52PM +0100, Ben Hutchings wrote:
> On Sun, 2013-07-21 at 09:54 +0100, Roger Leigh wrote:
> > If the bug is in amd64_edac_mod, there are only two possible commits
> > which could cause the problem:
> > 
> > 1eef12825 amd64_edac: Correct DIMM sizes
> > 94c1acf2c amd64_edac: Add Family 16h support
> > 
> > Which are the only commits between 3.8 and 3.9 (and none made since)
> 
> The crash is at this line:
> 
>                       csrow->channels[1]->dimm->nr_pages = row_dct1_pages;

Hmm.

If I can read it correctly above, 3.8 works for you Roger, correct?

If so, can you please enable CONFIG_EDAC_DEBUG, rebuild 3.8 and boot
your machine with it and send me the full dmesg of the boot?

> with csrow->channels[1]->dimm == NULL.
> 
> This code was introduced by the first commit above.  Does the patch
> below fix this?
> 
> Ben.
> 
> ---
> [PATCH] amd64_edac: Fix crash in init_csrows() for memory controller in 
> 64-bit mode
> 
> init_csrows() assumes all processesors after K8 have 2 memory channels.
> But these processors support a mode where only one channel is used.
> It seems that csrow_enabled() may still return true for the second
> channel (BIOS bug?).

Ok, I think I know what the problem is:

[    5.815246] EDAC amd64: DRAM ECC enabled.
[    5.816328] EDAC amd64: F15h detected (node 0).
[    5.817397] EDAC amd64: MC: 0:     0MB 1:     0MB
[    5.818379] EDAC amd64: MC: 2:     0MB 3:     0MB
[    5.819332] EDAC amd64: MC: 4:     0MB 5:     0MB
[    5.820250] EDAC amd64: MC: 6:     0MB 7:     0MB
[    5.821176] EDAC amd64: MC: 0:  4096MB 1:  4096MB
[    5.822131] EDAC amd64: MC: 2:  4096MB 3:  4096MB
[    5.823048] EDAC amd64: MC: 4:     0MB 5:     0MB
[    5.823927] EDAC amd64: MC: 6:     0MB 7:     0MB
[    5.824818] EDAC amd64: using x4 syndromes.
[    5.825680] EDAC amd64: MCT channel count: 1

Roger's DIMMs are only on the one channel and the second one is empty.
Btw, Roger, you might want to move one of the DIMMs to a another DIMM
socket on the board so that you can use both channels for performance
reasons.

I'm saying "one of the DIMMs" because I'm assuming those 4G above are
dual-ranked DIMMs and you have two 8G DIMMs on the board.

If they're single-ranked i.e. 4G each, then you shouldn't have any
choice because your board has only 4 DIMM slots anyway, AFAICT from the
manual:

http://www.asus.com/Motherboards/SABERTOOTH_990FX_R20/#support_Download_10

Which would be buggy because you're still using only one DCT.

Btw if you have two DIMMs on there, please put them according to the
recommended memory configurations, i.e. one in A2 and the other in B2.

Here's my layout, for example:

[    5.890887] EDAC MC: DCT0 chip selects:
[    5.890888] EDAC amd64: MC: 0:  2048MB 1:  2048MB
[    5.890889] EDAC amd64: MC: 2:  2048MB 3:  2048MB
[    5.890890] EDAC amd64: MC: 4:     0MB 5:     0MB
[    5.890891] EDAC amd64: MC: 6:     0MB 7:     0MB
[    5.890893] EDAC MC: DCT1 chip selects:
[    5.890894] EDAC amd64: MC: 0:  2048MB 1:  2048MB
[    5.890894] EDAC amd64: MC: 2:  2048MB 3:  2048MB
[    5.890895] EDAC amd64: MC: 4:     0MB 5:     0MB
[    5.890896] EDAC amd64: MC: 6:     0MB 7:     0MB
[    5.890897] EDAC amd64: using x4 syndromes.
[    5.890901] EDAC amd64: MCT channel count: 2

And I have 4 DIMM slots occupied.

> Check pvt->channel_count before csrow_enabled(), and remove the family
> number conditions.
> 
> Reported-by: Roger Leigh <rle...@debian.org>
> Signed-off-by: Ben Hutchings <b...@decadent.org.uk>
> Cc: 717...@bugs.debian.org
> ---
>  drivers/edac/amd64_edac.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 8b6a034..be9c2fe 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -2084,10 +2084,8 @@ static int init_csrows(struct mem_ctl_info *mci)
>        */
>       for_each_chip_select(i, 0, pvt) {
>               bool row_dct0 = !!csrow_enabled(i, 0, pvt);
> -             bool row_dct1 = false;
> -
> -             if (boot_cpu_data.x86 != 0xf)
> -                     row_dct1 = !!csrow_enabled(i, 1, pvt);

Ok, this shouldn't be set if DCT1 doesn't have enabled csrows. So yes,
Roger, that debugging output would be of great help.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to