Hi Buchan, >>> My Samba box stalls after some usage, mapped drives disappear and >>> users can't write or read from drives. The stalls happen randomly. >>> I'm running 2.4.19-16mdksmp and Samba 2.2.7a-9.2mdk. May I ask is >>> this a kernel bug or Samba bug? Does anyone know a fix for it? I >>> checked the memory from BIOS, they didn't report any errors. >> >> BIOS memory check is (mostly) useless. Use memtest86 or similar. > > Sure I will test it with memtest86 and report back. I have been > running LM9.0 with Samba on this box for 3/4 year now. The problem > only arose in the last 2 months by random. I swapped brand new > Crucial Micron ECC DDR266 SDRAM, but the problem still presists. BTW, > the BIOS memory check is quite extensive (Intel claims to scan it > block by block). It takes about 1 to 2 minutes for it to scan the > memory. Not sure how this compares to memtest86. I guess I will wait > after hours before I can run a memtest86.
I ran memtest and found no error. Do you have other suggestions that I can further troubleshoot this? There are no cards plugged to the system. The system just runs software RAID. Thus it seems to be either XFS, md or samba bug. Maybe I could try upgrading samba to 2.2.8a-2mdk from your web server. Are there potential gotchas that I should watch out for? >>> /var/log/kernel/warnings >>> ------------------------ >>> Oct 27 09:19:12 smbserver kernel: xfs_force_shutdown(md(9,5),0x8) >>> called from line 1039 of file xfs_trans.c. Return address = >>> 0xe08ae312 >>> Oct 27 09:19:12 smbserver kernel: Corruption of in-memory data >>> detected. Shutting down filesystem: md(9,5) >>> Oct 27 09:19:12 smbserver kernel: Please umount the filesystem, and >>> rectify the problem(s) >> >> This seems to point quite strongly to either hardware (most likely >> memory) or kernel (xfs driver or md driver, it seems you are running >> software raid?) If the kernel has problems with a filesystem, there's >> nothing much samba can do about it ... > > I'm using software RAID. Do you know if there are recent updates to > the Mandrake kernel that may fix bugs in XFS and md drivers? Funny > thing is that only Samba dies. SSH and others still work. > >>> /var/log/kernel/errors >>> ---------------------- >>> Oct 27 10:36:44 smbserver kernel: Unknown bridge resource 2: >>> assuming transparent >>> Oct 27 10:36:44 smbserver kernel: PCI: Unable to handle 64-bit >>> address space for >>> Oct 27 10:36:44 smbserver kernel: PCI: Unable to handle 64-bit >>> address space for >>> Oct 27 10:36:44 smbserver kernel: Unknown bridge resource 2: >>> assuming transparent >>> Oct 27 10:36:44 smbserver kernel: PCI: Device 00:1f.1 not available >>> because of resource collisions >> >> You need to give some more information on the hardware on this >> machine, but something does not look right ... what's in >> /proc/interrupts ? > > I'm using Intel SE7500WV2S Server Board. BIOS Version: 2.01 Build > 0483. My /proc/interrupts are as follows. I have seen the boot screen > complaint about resources collision, but couldn't find out the cause. > I've disabled all unecessary ports in the BIOS (e.g., USB). > > CPU0 CPU1 CPU2 CPU3 > 0: 1385866 0 0 0 IO-APIC-edge timer > 1: 7 0 0 0 IO-APIC-edge keyboard > 2: 0 0 0 0 XT-PIC cascade > 8: 1 0 0 0 IO-APIC-edge rtc > 12: 197 0 0 0 IO-APIC-edge PS/2 Mouse > 15: 5 0 0 0 IO-APIC-edge ide1 > 30: 677193 0 0 0 IO-APIC-level eth1 > 31: 923339 0 0 0 IO-APIC-level eth0 > 49: 38985 0 0 0 IO-APIC-level aic7xxx > 50: 16 0 0 0 IO-APIC-level aic7xxx > NMI: 0 0 0 0 > LOC: 1385678 1385677 1385676 1385676 > ERR: 0 > MIS: 0 Regards, Norman
