Hi,
>> My Samba box stalls after some usage, mapped drives disappear and
>> users can't write or read from drives. The stalls happen randomly.
>> I'm running 2.4.19-16mdksmp and Samba 2.2.7a-9.2mdk. May I ask is
>> this a kernel bug or Samba bug? Does anyone know a fix for it? I
>> checked the memory from BIOS, they didn't report any errors.
>
> BIOS memory check is (mostly) useless. Use memtest86 or similar.
Sure I will test it with memtest86 and report back. I have been running
LM9.0 with Samba on this box for 3/4 year now. The problem only arose in the
last 2 months by random. I swapped brand new Crucial Micron ECC DDR266
SDRAM, but the problem still presists. BTW, the BIOS memory check is quite
extensive (Intel claims to scan it block by block). It takes about 1 to 2
minutes for it to scan the memory. Not sure how this compares to memtest86.
I guess I will wait after hours before I can run a memtest86.
>> /var/log/kernel/warnings
>> ------------------------
>> Oct 27 09:19:12 smbserver kernel: xfs_force_shutdown(md(9,5),0x8)
>> called from line 1039 of file xfs_trans.c. Return address =
>> 0xe08ae312
>> Oct 27 09:19:12 smbserver kernel: Corruption of in-memory data
>> detected. Shutting down filesystem: md(9,5)
>> Oct 27 09:19:12 smbserver kernel: Please umount the filesystem, and
>> rectify the problem(s)
>
> This seems to point quite strongly to either hardware (most likely
> memory) or kernel (xfs driver or md driver, it seems you are running
> software raid?) If the kernel has problems with a filesystem, there's
> nothing much samba can do about it ...
I'm using software RAID. Do you know if there are recent updates to the
Mandrake kernel that may fix bugs in XFS and md drivers? Funny thing is that
only Samba dies. SSH and others still work.
>> /var/log/kernel/errors
>> ----------------------
>> Oct 27 10:36:44 smbserver kernel: Unknown bridge resource 2: assuming
>> transparent
>> Oct 27 10:36:44 smbserver kernel: PCI: Unable to handle 64-bit
>> address space for
>> Oct 27 10:36:44 smbserver kernel: PCI: Unable to handle 64-bit
>> address space for
>> Oct 27 10:36:44 smbserver kernel: Unknown bridge resource 2: assuming
>> transparent
>> Oct 27 10:36:44 smbserver kernel: PCI: Device 00:1f.1 not available
>> because of resource collisions
>
> You need to give some more information on the hardware on this
> machine, but something does not look right ... what's in
> /proc/interrupts ?
I'm using Intel SE7500WV2S Server Board. BIOS Version: 2.01 Build 0483. My
/proc/interrupts are as follows. I have seen the boot screen complaint about
resources collision, but couldn't find out the cause. I've disabled all
unecessary ports in the BIOS (e.g., USB).
CPU0 CPU1 CPU2 CPU3
0: 1385866 0 0 0 IO-APIC-edge timer
1: 7 0 0 0 IO-APIC-edge keyboard
2: 0 0 0 0 XT-PIC cascade
8: 1 0 0 0 IO-APIC-edge rtc
12: 197 0 0 0 IO-APIC-edge PS/2 Mouse
15: 5 0 0 0 IO-APIC-edge ide1
30: 677193 0 0 0 IO-APIC-level eth1
31: 923339 0 0 0 IO-APIC-level eth0
49: 38985 0 0 0 IO-APIC-level aic7xxx
50: 16 0 0 0 IO-APIC-level aic7xxx
NMI: 0 0 0 0
LOC: 1385678 1385677 1385676 1385676
ERR: 0
MIS: 0
>> /var/log/samba/log.winbindd
>> ---------------------------
>> [2003/10/27 10:37:23, 0] nsswitch/winbindd.c:process_loop(626)
>> process_loop: Invalid request size (1701996389) sent, should be
>> (1304)
>
> Some winbind users have reported winbind in 2.2.8a works a bit
> better, you can find packages on the samba FTP mirrors for all
> supported releases (hmm, except ldap-enabled packages for 9.2 ... I
> must do this still ...).
Thanks. I will wait for your updates. BTW, http://ranger.dnsalias.com/ is
one of my mostly watched site. Too bad Samba 3.0 had some nasty bugs.
Otherwise, I would love to upgrade and test it out.
Regards,
Norman