Hi Buchan,

>>> My Samba box stalls after some usage, mapped drives disappear and
>>> users can't write or read from drives. The stalls happen randomly.
>>> I'm running 2.4.19-16mdksmp and Samba 2.2.7a-9.2mdk. May I ask is
>>> this a kernel bug or Samba bug? Does anyone know a fix for it? I
>>> checked the memory from BIOS, they didn't report any errors.
>>
>> BIOS memory check is (mostly) useless. Use memtest86 or similar.
>
> Sure I will test it with memtest86 and report back. I have been
> running LM9.0 with Samba on this box for 3/4 year now. The problem
> only arose in the last 2 months by random. I swapped brand new
> Crucial Micron ECC DDR266 SDRAM, but the problem still presists. BTW,
> the BIOS memory check is quite extensive (Intel claims to scan it
> block by block). It takes about 1 to 2 minutes for it to scan the
> memory. Not sure how this compares to memtest86. I guess I will wait
> after hours before I can run a memtest86.

I ran memtest and found no error. Do you have other suggestions that I can
further troubleshoot this? There are no cards plugged to the system. The
system just runs software RAID. Thus it seems to be either XFS, md or samba
bug. Maybe I could try upgrading samba to 2.2.8a-2mdk from your web server.
Are there potential gotchas that I should watch out for?

>>> /var/log/kernel/warnings
>>> ------------------------
>>> Oct 27 09:19:12 smbserver kernel: xfs_force_shutdown(md(9,5),0x8)
>>> called from line 1039 of file xfs_trans.c.  Return address =
>>> 0xe08ae312
>>> Oct 27 09:19:12 smbserver kernel: Corruption of in-memory data
>>> detected. Shutting down filesystem: md(9,5)
>>> Oct 27 09:19:12 smbserver kernel: Please umount the filesystem, and
>>> rectify the problem(s)
>>
>> This seems to point quite strongly to either hardware (most likely
>> memory) or kernel (xfs driver or md driver, it seems you are running
>> software raid?) If the kernel has problems with a filesystem, there's
>> nothing much samba can do about it ...
>
> I'm using software RAID. Do you know if there are recent updates to
> the Mandrake kernel that may fix bugs in XFS and md drivers? Funny
> thing is that only Samba dies. SSH and others still work.
>
>>> /var/log/kernel/errors
>>> ----------------------
>>> Oct 27 10:36:44 smbserver kernel: Unknown bridge resource 2:
>>> assuming transparent
>>> Oct 27 10:36:44 smbserver kernel: PCI: Unable to handle 64-bit
>>> address space for
>>> Oct 27 10:36:44 smbserver kernel: PCI: Unable to handle 64-bit
>>> address space for
>>> Oct 27 10:36:44 smbserver kernel: Unknown bridge resource 2:
>>> assuming transparent
>>> Oct 27 10:36:44 smbserver kernel: PCI: Device 00:1f.1 not available
>>> because of resource collisions
>>
>> You need to give some more information on the hardware on this
>> machine, but something does not look right ... what's in
>> /proc/interrupts ?
>
> I'm using Intel SE7500WV2S Server Board. BIOS Version: 2.01 Build
> 0483. My /proc/interrupts are as follows. I have seen the boot screen
> complaint about resources collision, but couldn't find out the cause.
> I've disabled all unecessary ports in the BIOS (e.g., USB).
>
>            CPU0    CPU1    CPU2    CPU3
>   0:    1385866       0       0       0  IO-APIC-edge  timer
>   1:          7       0       0       0  IO-APIC-edge  keyboard
>   2:          0       0       0       0        XT-PIC  cascade
>   8:          1       0       0       0  IO-APIC-edge  rtc
>  12:        197       0       0       0  IO-APIC-edge  PS/2 Mouse
>  15:          5       0       0       0  IO-APIC-edge  ide1
>  30:     677193       0       0       0 IO-APIC-level  eth1
>  31:     923339       0       0       0 IO-APIC-level  eth0
>  49:      38985       0       0       0 IO-APIC-level  aic7xxx
>  50:         16       0       0       0 IO-APIC-level  aic7xxx
> NMI:          0       0       0       0
> LOC:    1385678 1385677 1385676 1385676
> ERR:          0
> MIS:          0

Regards,
Norman




Reply via email to