Posted again from the e-mail address I am registered to
---------- Forwarded message ---------- From: Francesco Pietra <francesco.pie...@accademialucchese.it> Date: Fri, Jan 2, 2009 at 8:26 PM Subject: Failure to load amd64 overcome, though mem problems To: amd64 Debian <debian-am...@lists.debian.org>, debian-users <debian-user@lists.debian.org> Hi: Near the end of last year, in a period of vacation, I posted to amd64 about failure to start amd64 lenny with a Supermicro H8QC8 motherboard. This board has chipset nVidia CK804, which is also memory controller, and AMD 8132. It bears 4 dual opteron 875 CPUs, two WD Raptor under RAID as well as 8 KVR400D4R3A/2G and 8 KVR400D4R3A/1G. Lenny is set not to load the X system. The computer is powered through an APC 1500 and Enermax EGX1000EWL. Cooling is extremely efficient. The system was shut down correctly when top indicated 24GB total RAM. After a few days untouched, the OS did not load, the screen showing a series of lines starting with RDX RBP R10 R13 FS CS CR2 DR0 DR3, followed by Call Trace: ffff do_oage fff handle_mm_fault fff vma_link fff error_exit fff clear_user fff padzero fff get_arg_page fff copy_strings fff search_binary_handler fffdo_execve fff sys_execve fff stub_execve After that such lines alternate, and the whole <Call Trace> started several times anew, everything disappeared from the screen and could not be recovered with the keyboard. Knoppix 5.3.1 loaded correctly, detected all 8 logical CPUs, the raid1 partitions (mdadm) were OK, however it detected 20GB total mem, instead of the 24GB expected. memtest86+-2.11 detected 17GB total mem and was let to run for the whole 8 cycles (which took seven hours), reporting no mem errors. DMI mem device info showed: DIMM 0 to DIMM 7: size 64; speed 400; type DDR DIMM 8 to DIMM 10: size empty; speed 200; type DDR DIMM 11: size 2048; speed 200; type DDR DIMM 12 to DIMM 15: size 64; speed 200; type DDR. On rebooting, lenny started correctly. Top showed 18079572k total, also when running a parallelized application that engaged all 8 CPUs. lshw agreed with memtest as to the DIMMs, except for the one marked of size = 2048, which lshw marked of size=64. I was surprised that half of the slots were indicated by both memtest and lshw at speed=200; I tentatively assume this is a feature of the mainboard not of the mem slots. ============= The actual mem size is insufficient for my computations and the empty DIMMs need attention I believe. There is no system maintainer here and I have to try to restore the system alone, also because I assembled the computer. My question is from where to start at this point. The mem slots seem to be plugged in as before but I did not try to remove and replug. The four blocks on the mainboard were filled as follows: DIMMA-2A 1GB DIMMA-2B 1GB DIMMA-1A 2GB DIMMA-1B 2GB DIMMB-1B 2GB DIMMB-1A 2GB DIMMB-2B 1GB DIMMB-2A 1GB DIMMC-2A 1GB DIMMC-2B 1GB DIMMC-1A 2GB DIMMC-1B 2GB DIMMD-1B 2GB DIMMD-1A 2GB DIMMD-2B 1GB DIMMD-2A 1GB ============================= This mail started originally under the hypothesis that the problem was some degradation of lenny. I understand now that this mail is largely out of topic both on amd64 and users. Hope only that experienced users may suggest from their experience. Thanks and happy 2009! francesco pietra -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org