On Sun, 23 Jan 2005, Artem Kuchin wrote:

> On Sat, 22 Jan 2005, Artem Kuchin wrote:
> >> I cvssed just an hour ago. 5.3-STABLE and cannot build kernel with
>> WITNES. It complains: > > This occurs when building WITNESS without DDB in the kernel, which was not
> a tested build case when I added "show alllocks", and apparently is a
> relatively uncommon configuration as you're the first person to bump into
> it. I've just committed the fix as subr_witness.c:1.187 in HEAD, and
> subr_witness.c:1.178.2.4 in RELENG_5. Please let me know if this doesn't
> fix the problem for you.


It fixed the problem. I am actually stuggling with unpredictable weird
lock ups, when the host can be pinged but i cannot connect via
ssh/telnet or httpd or anything else. It happens w/o any visible reason.
I am running several jails with mysql and apache in each and canot make
the whole system stable yet.

This is typically a sign of one of two problems:

- The system is live locked due to very high load, so the ithread,
netisrs, etc, in the kernel run fine, but user processes don't get a
chance to run.


- The system is dead locked due to user space processes getting wedged on
common locks, but the kernel ithreads and netisrs can keep on
responding.


I generally assume that it's a deadlock as opposed to a live lock.  I'd
compile a kernel with DDB, KDB, WITNESS, and BREAK_TO_DEBUGGER.  When the
system appears to wedge, break into the debugger using a console or serial
break (FYI: serial break is more reliable, and you get the benefit of
being able to easily copy and paste debugging output using the serial
console for DDB).  Use "show alllocks" and "show lockedvnods" to examine
most of the system's locking state.  Changes are, either all the
interesting processes are stacked up on VFS or VM locks, since those kinds
of deadlocks produce the exact symptoms you describe: ping works fine
because it only hits the netisr, but when you open TCP connections, the
sshd (etc) block on VM or VFS locks attempting to fork new children or
access a file in the file system name space.  At first, the TCP
connections will establish but there will be no application data; after a
bit, they will not even return a SYN/ACK because the listen queue for the
listen socket has filled.


Well, i cvsed and reconpiled the kernel with WITNESS, INVARINATS, turned off adaptive giant and got a lock today at 7 am. Since the server is remotely controlled i took my digital camera because i cannot connect serial console to it and went to the server. I expetced to see some special message about something going wrong, break into debugger (CTRL+ALT+ESC) and to take some pictures of dumps of console. But, i saw nothing. The lasrt message on th screen was about ssh loging last evening and the last message in /var/log/all.log was about entropy save from cron. I could not break into debugger usinmg CTRL-ALT+ESC. I did nothing. So, it looked like a hard lock.

At this point i would like to tell the whole story.
We bought this server in may 2004 and decided to extemsively test the hardware
while there were not 5.3. We actually expected it around august. SO, we installed
5-CURRENT and ran high load tests (cpu, memory, disk storage, network) from
/usr/ports/benchmark at the same time and one-by-one several weeks. There were
not a glitch. After that we turned it off and waited for RELEASE. RELEASE has
come and we begun to setup the servre as it should work. As the server's
primary mission is to host a buch of site we decided to setup jails for each site,
So we did in december and put the server on prividers co-location severals
kilometer away from the office. Next day the server locked up. We were surprised
but just rebooted it, It locked up the next day gain. We cvsupped and rebuild the
system and the jails. The server locked up the next day. During the new year break
i have figureed that if there are more that one jail running the server locks withun
24 hours with very hight probablity and within 48 hours with 100% probability. I wrote into freebsd-stable about it. You have asked for debugger dump (pcpu, list of
lock, e.t.c). I could not do it at that time, so, i did not reply and just cvsupped in
the beginning of january and rebuilt the system and the jails again. Magically, after
that i could run 5 jails (did not tried more) for over a week and i already decided that
the bug was fixed and I could host the site. Alas, the next glitch did not wait to long.
After a few more days i saw a srange situatuon - i could not connect to server using
SSH. SSH replied about auth key or something like that. I rebootied the system and
ssh worked fine. Still have no idea what that was, but i setuo IPFIREWALL and a telnet
server for accept connection only from one ip address, so, if ssh fails I could use telnet.
After that i moved a real site with perl scripts, 1GB database, mail account (using qmail+vpopmail)
into one of the jails and the next day got the next problem: I could ping server, but could not
connect using ssh, www, telnet (110,25,23). I tried to recompile the kernel with INVARINATS,
WITNESS and disable the adaptive giant. I could not, so I wrote about it to you. You fixed
the source and now i recompliled the source again and today got a lock again with all those
options enabled and this time i could not ping the server.
I could thing that there is semething wrong with the hardware, but it passed
many days of testing. Anyway, my current idea are


1) Something wrong with jail code 2) Something wrong with SMP code
3) Something wrong with HYPERTHREADING code
4) Something wrong with Memory disk code (md device, which i use)
5) Something wrong with the hardware


So, today, i opened bios, truned off hyperthreaading, fast strinmg operations 
and
all other 'more advanced' features in the bios. Turned off IDE controller the 
motherboard.
This rule out HYPERTHREADING code problem and somewaht hardware problem.

I turned off MD usage (not more memory disk, but actually i need it very badly).
So i rule out the md code problem.

Now, i will run some web access test (simulation of browsing for a week). It the
sever does not lock up, i will consider that i have found a workaround for some hidden bug and the bug is somewere in md, ht code or hardware.


If it locks up again the i will giveup jails and try for one more  week. If it 
does not
lock up - jail code is the problem.

If it locks up without jails, then i will turn off SMP and try again.

If it locks up without nothing, then hardware if faulty and will have futher choice of hanging myself or shooting in the head.

I would like to see your and others' comments on the story and i have one
more question: what does options _KPOSIX_PRIORITY_SCHEDULING do? May it be somehow related to the problem?



The hardware is:

MB dual xeon Supermicro X5DPE-G2 CPU P4 XEON 2,667Ghz 512Kb cache 533mhz socket 604 2 Gb 266Mhz, DDR, ECC, Reg, 1GB dimm 4 HDDs 120Gb (seagate baracuda 7200.7) 3Ware Escalade 8506-4LP Case Supermicro SC822T-550LP Slim DVD/CD-RW Toshiba SD-R2412B IDE (OEM)


The todays kernel CONFIG  wich got locked:

machine         i386
cpu             I486_CPU
cpu             I586_CPU
cpu             I686_CPU
ident           OMNI2

options         SMP

options         QUOTA

options         SCHED_4BSD              # 4BSD scheduler
options         INET                    # InterNETworking
options         INET6                   # IPv6 communications protocols
options         FFS                     # Berkeley Fast Filesystem
options         SOFTUPDATES             # Enable FFS soft updates support
options         UFS_ACL                 # Support for access control lists
options         UFS_DIRHASH             # Improve performance on big directories
#options        MD_ROOT                 # MD is a potential root device
#options        NFSCLIENT               # Network Filesystem Client
#options        NFSSERVER               # Network Filesystem Server
#options        NFS_ROOT                # NFS usable as /, requires NFSCLIENT
options         MSDOSFS                 # MSDOS Filesystem
options         CD9660                  # ISO 9660 Filesystem
options         PROCFS                  # Process filesystem (requires PSEUDOFS)
options         PSEUDOFS                # Pseudo-filesystem framework
options         GEOM_GPT                # GUID Partition Tables.
options         COMPAT_43               # Compatible with BSD 4.3 [KEEP THIS!]
options         COMPAT_FREEBSD4         # Compatible with FreeBSD4
#options        SCSI_DELAY=15000        # Delay (in ms) before probing SCSI
options         KTRACE                  # ktrace(1) support
options         SYSVSHM                 # SYSV-style shared memory
options         SYSVMSG                 # SYSV-style message queues
options         SYSVSEM                 # SYSV-style semaphores
options         _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time 
extensions
#options        KBD_INSTALL_CDEV        # install a CDEV entry in /dev

device          apic            # I/O APIC

# Bus support.  Do not remove isa, even if you have no isa slots
device          isa
device          pci

# Floppy drives
device          fdc

# ATA and ATAPI devices
device          ata
device          atadisk         # ATA disk drives
device          ataraid         # ATA RAID drives
device          atapicd         # ATAPI CDROM drives
#device         atapifd         # ATAPI floppy drives
#device         atapist         # ATAPI tape drives
options         ATA_STATIC_ID   # Static device numbering

# SCSI peripherals
device          scbus           # SCSI bus (required for SCSI)
device          da              # Direct Access (disks)
device          pass            # Passthrough device (direct SCSI access)
device          twe             # 3ware ATA RAID

# atkbdc0 controls both the keyboard and the PS/2 mouse
device          atkbdc          # AT keyboard controller
device          atkbd           # AT keyboard
device          psm             # PS/2 mouse

device          vga             # VGA video card driver

device          splash          # Splash screen and screen saver support

# syscons is the default console driver, resembling an SCO console
device          sc

device          agp             # support several AGP chipsets

# Floating point support - do not disable.
device          npx

# Power management support (see NOTES for more options)
#device         apm
# Add suspend/resume support for the i8254.
#device         pmtimer

# Serial (COM) ports
device          sio             # 8250, 16[45]50 based serial ports

# Parallel port
device          ppc
device          ppbus           # Parallel port bus (required)
device          lpt             # Printer
device          ppi             # Parallel port interface device
#device         vpo             # Requires scbus and da


device miibus # MII bus support device fxp # Intel EtherExpress PRO/100B (82557, 82558) device em


device loop # Network loopback device mem # Memory and kernel memory devices device io # I/O device device random # Entropy device device ether # Ethernet support #device sl # Kernel SLIP #device ppp # Kernel PPP device tun # Packet tunnel. device pty # Pseudo-ttys (telnet etc) device md # Memory "disks" #device gif # IPv6 and IPv4 tunneling #device faith # IPv6-to-IPv4 relaying (translation)

device          bpf             # Berkeley packet filter
# USB support
device          uhci            # UHCI PCI->USB interface
device          ohci            # OHCI PCI->USB interface
device          usb             # USB Bus (required)
#device         udbp            # USB Double Bulk Pipe devices
device          ugen            # Generic
device          uhid            # "Human Interface Devices"
device          ulpt            # Printer
device          umass           # Disks/Mass storage - Requires scbus and da


# FireWire support device firewire # FireWire bus code #device sbp # SCSI over FireWire (Requires scbus and da) #device fwe # Ethernet over FireWire (non-standard!)

options         IPFIREWALL
options         IPFIREWALL_VERBOSE
options         IPFIREWALL_VERBOSE_LIMIT=10000
options         IPFIREWALL_DEFAULT_TO_ACCEPT

device          snp
device          speaker

options         DDB
options         KDB
options         BREAK_TO_DEBUGGER
options         INVARIANT_SUPPORT
options         INVARIANTS
options         WITNESS
options         WITNESS_KDB
options         WITNESS_SKIPSPIN
#options        ADAPTIVE_GIANT          # Giant mutex is adaptive.


DMESG (the config which got locked):

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
       The Regents of the University of California. All rights reserved.
FreeBSD 5.3-STABLE #3: Sun Jan 23 01:04:00 MSK 2005
   [EMAIL PROTECTED]:/usr/obj/usr/src/sys/OMNI2
WARNING: WITNESS option enabled, expect reduced performance.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 2.66GHz (2665.93-MHz 686-class CPU)
 Origin = "GenuineIntel"  Id = 0xf25  Stepping = 5
 
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,S
SE2,SS,HTT,TM,PBE>
 Hyperthreading: 2 logical CPUs
real memory  = 4160225280 (3967 MB)
avail memory = 4077486080 (3888 MB)
ACPI APIC Table: <PTLTD          APIC  >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
cpu0 (BSP): APIC ID:  0
cpu1 (AP): APIC ID:  1
cpu2 (AP): APIC ID:  6
cpu3 (AP): APIC ID:  7
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
ioapic2 <Version 2.0> irqs 48-71 on motherboard
ioapic3 <Version 2.0> irqs 72-95 on motherboard
ioapic4 <Version 2.0> irqs 96-119 on motherboard
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <PTLTD   RSDT> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
cpu0: <ACPI CPU (2 Cx states)> on acpi0
cpu1: <ACPI CPU (2 Cx states)> on acpi0
cpu2: <ACPI CPU (2 Cx states)> on acpi0
cpu3: <ACPI CPU (2 Cx states)> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pci0: <unknown> at device 0.1 (no driver attached)
pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pci1: <base peripheral, interrupt controller> at device 28.0 (no driver 
attached)
pcib2: <ACPI PCI-PCI bridge> at device 29.0 on pci1
pci2: <ACPI PCI bus> on pcib2
pci1: <base peripheral, interrupt controller> at device 30.0 (no driver 
attached)
pcib3: <ACPI PCI-PCI bridge> at device 31.0 on pci1
pci3: <ACPI PCI bus> on pcib3
em0: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port 
0x3000-0x303f mem 0xfc200000-0xfc21ffff irq 28 at device 2
.0 on pci3
em0: Ethernet address: 00:30:48:2a:2d:bc
em0:  Speed:N/A  Duplex:N/A
em1: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port 
0x3040-0x307f mem 0xfc220000-0xfc23ffff irq 29 at device 2
.1 on pci3
em1: Ethernet address: 00:30:48:2a:2d:bd
em1:  Speed:N/A  Duplex:N/A
pcib4: <ACPI PCI-PCI bridge> at device 3.0 on pci0
pci4: <ACPI PCI bus> on pcib4
pci4: <base peripheral, interrupt controller> at device 28.0 (no driver 
attached)
pcib5: <ACPI PCI-PCI bridge> at device 29.0 on pci4
pci5: <ACPI PCI bus> on pcib5
pci4: <base peripheral, interrupt controller> at device 30.0 (no driver 
attached)
pcib6: <ACPI PCI-PCI bridge> at device 31.0 on pci4
pci6: <ACPI PCI bus> on pcib6
twe0: <3ware Storage Controller. Driver version 1.50.01.002> port 0x4000-0x400f 
mem 0xfc800000-0xfcffffff irq 72 at device 1.
0 on pci6
twe0: [GIANT-LOCKED]
twe0: 4 ports, Firmware FE7S 1.05.00.063, BIOS BE7X 1.08.00.048
uhci0: <Intel 82801CA/CAM (ICH3) USB controller USB-A> port 0x2000-0x201f irq 
16 at device 29.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: <Intel 82801CA/CAM (ICH3) USB controller USB-A> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <Intel 82801CA/CAM (ICH3) USB controller USB-B> port 0x2020-0x203f irq 
19 at device 29.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: <Intel 82801CA/CAM (ICH3) USB controller USB-B> on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <Intel 82801CA/CAM (ICH3) USB controller USB-C> port 0x2040-0x205f irq 
18 at device 29.2 on pci0
uhci2: [GIANT-LOCKED]
usb2: <Intel 82801CA/CAM (ICH3) USB controller USB-C> on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
pcib7: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci7: <ACPI PCI bus> on pcib7
pci7: <display, VGA> at device 1.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel ICH3 UDMA100 controller> port 0x2060-0x206f,0x3f6,0x1f0-0x1f7 
at device 31.1 on pci0
ata0: channel #0 on atapci0
ata2: channel #1 on atapci0
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
acpi_button0: <Power Button> on acpi0
speaker0: <PC speaker> port 0x61 on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
atkbd0: [GIANT-LOCKED]
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
fdc0: [FAST]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
orm0: <ISA Option ROMs> at iomem 
0xe0000-0xe3fff,0xc9000-0xc9fff,0xc8000-0xc8fff,0xc0000-0xc7fff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 10.000 msec
ipfw2 initialized, divert disabled, rule-based forwarding disabled, default to 
accept, logging limited to 10000 packets/entry
by default
acd0: CDRW <TOSHIBA DVD-ROM SD-R2412/1015> at ata0-slave UDMA33
twed0: <Unit 0, RAID5, Normal> on twe0
twed0: 343417MB (703318656 sectors)
SMP: AP CPU #2 Launched!
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!
Mounting root from ufs:/dev/twed0s1a
em0: Link is up 100 Mbps Full Duplex


_______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to