On Nov 19, 2007 7:40 PM, Noah Meyerhans <[EMAIL PROTECTED]> wrote: > Those of you who saw my recent blog post [1] are, no doubt, waiting with > baited breath for the return of our mipsel porting machine. > Unfortunately, problems persist even after addressing the cooling > problems that I initially believed were affecting the machine's > stability. > > Vaughan will run for some time, but will eventually start misbehaving. > It stays up longer if it's no under any load, but still does eventually > go down. Here are some of the kernel dumps that it shows. These code > dumps are from Linux 2.6.23.1, but similar problems occur in other > kernels. > > Kernel bug detected[#2]: > Cpu 0 > $ 0 : 00000000 b0007c01 00000001 00003fff > $ 4 : 810caa60 7fe9bf0a 80310000 000caa60 > $ 8 : 00006553 7fe9bf0a 800f1098 00000000 > $12 : 00000000 00000000 85811da0 746f6f72 > $16 : 810caa60 8347f56c 0000000e 7fe9bf0a > $20 : 811c11b8 803321e0 856d7e2c 856d7e28 > $24 : 99999999 2ac30710 > $28 : 856d6000 856d7da8 00000001 80089e2c > Hi : 00000000 > Lo : 00000000 > epc : 8008ad9c kmap_coherent+0xc/0xe0 Tainted: G D > ra : 80089e2c __flush_anon_page+0x4c/0x68 > Status: b0007c03 KERNEL EXL IE > Cause : 00808034 > PrId : 000028a0 > Process w (pid: 28428, threadinfo=856d6000, task=8116e928) > Stack : 803321e0 8347f56c 0000000e 7fe9bf0a 800db0d0 800dad84 00000001 > 856d7ea0 > 800f18d0 00000000 00000011 00000000 00000030 00000000 803321e0 > 7fe9bf0a > 866c8000 0000000f 000007ff 803321e0 00000000 856d7e28 856d7e2c > 800db2b8 > 811c11b8 8116e928 000000d0 00000000 00000000 00000001 856d7e2c > 856d7e28 > 00000000 810caa60 80332214 00000000 803321e0 00000000 0000000f > 866c8000 > ... > Call Trace: > [<8008ad9c>] kmap_coherent+0xc/0xe0 > [<80089e2c>] __flush_anon_page+0x4c/0x68 > [<800db0d0>] get_user_pages+0x3c4/0x4ac > [<800db2b8>] access_process_vm+0x100/0x21c > [<8012d91c>] proc_pid_cmdline+0xa4/0x14c > [<8012f858>] proc_info_read+0x100/0x140 > [<800f0b4c>] vfs_read+0xc0/0x160 > [<800f10ec>] sys_read+0x54/0xa0 > [<80088d0c>] stack_done+0x20/0x3c > > > Code: 8c820000 00021242 30420001 <00028036> 8f820014 3c038035 24420001 > af820014 8c629240 > > This is the first sign of trouble. The symptoms observable from > userland are that just about any program that you try to run dies with a > segfault. The machine never recovers from this state, and eventually > gets worse: > > CPU 0 Unable to handle kernel paging request at virtual address > 000000d0, epc == 800ebb34, ra == 800eb68c > Oops[#4]: > Cpu 0 > $ 0 : 00000000 90007c00 8035dc08 000000d0 > $ 4 : 8111fa80 83fdb990 0000002a 83fdb000 > $ 8 : 8035dc00 00000000 00000001 00024000 > $12 : 00000001 00080000 fff7ffff 00200200 > $16 : 8035e694 00000021 8111fa80 00000000 > $20 : 00024000 80350000 00200200 00100100 > $24 : 00100100 00000000 > $28 : 80378000 80379cd8 0000003c 800eb68c > Hi : 00000036 > Lo : 000000d8 > epc : 800ebb34 free_block+0xec/0x1b0 Tainted: G D > ra : 800eb68c cache_flusharray+0x74/0xfc > Status: 90007c02 KERNEL EXL > Cause : 0080800c > BadVA : 000000d0 > PrId : 000028a0 > Process kswapd0 (pid: 72, threadinfo=80378000, task=8116fa08) > Stack : 00808400 800cf650 90007c01 800b4334 0000003c 90007c01 00000000 > 8035e600 > 8035e610 80379da8 00000001 00000000 0000000d 800eb68c 819aae70 > 0000002a > 87ead070 0000003a 8035e600 90007c01 8695e8c0 80379f48 00000001 > 800eb938 > 80355ca0 810d5a40 80379e74 80379f48 8695e8c0 00000001 80379e74 > 80116a58 > 80379e74 80379f48 00000001 80379da8 80116f30 80116f10 800d4c78 > 8101e2a0 > ... > Call Trace: > [<800ebb34>] free_block+0xec/0x1b0 > [<800eb68c>] cache_flusharray+0x74/0xfc > [<800eb938>] kmem_cache_free+0x110/0x118 > [<80116a58>] free_buffer_head+0x2c/0x48 > [<80116f30>] try_to_free_buffers+0x6c/0xcc > [<800d5330>] shrink_page_list+0x640/0x7fc > [<800d573c>] shrink_zone+0x250/0xbfc > [<800d6700>] kswapd+0x2ac/0x434 > [<800b8658>] kthread+0x58/0x94 > [<800835a4>] kernel_thread_helper+0x10/0x18 > > Code: 8ce30004 8ce20000 8c88004c <ac620000> ac430004 acf70000 acf60004 > 8ce2000c 8e440014 > > The call trace in this latter case isn't always the same, but free_block > does always seem to be at the top of the stack. > > It's quite possible that this is a hardware problem. Do others concur? > Is there any chance that it is software? If it is hardware, my > inclination would be to suspect RAM. Does anybody have a decent source > for Cobalt Raq2 memory? > > noah > > 1. http://nlm-morgul.livejournal.com/12188.html > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.6 (GNU/Linux) > > iD8DBQFHQdiJYrVLjBFATsMRAop/AKCJR4bMRMZPhXXYIc0lbvz/tifN3ACbB6pe > KCxfVPx865cm/bVKTSowmVQ= > =xHV6 > -----END PGP SIGNATURE----- > >
Hi Noah, I've bought last year 256meg of RAM for my Qube2 at the following URL: http://www.satech.com/128mb-cobalt-qube-2.html I was satisfied of their service (one of the stick was DOA, they send me a replacement at their expense - cross atlantic). Anyway, in the meanwhile if you need access to my Qube2 let me know... Regards, Seb. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

