This is a long one, but mainly because I've tried to include notes about
what I've already looked at.  Thanks in advance for taking the time to
read this.

I have a FreeBSD 6.1-RELEASE/amd64 system which routinely needs to
accept traffic at fairly high speeds.  The system is accepting traffic
at fairly high rates; 'systat -if' suggests 428551GB (not a typo, but
possibly a display bug in 'systat') over the past 63 days, or an average
rate of a bit over 600Mb/sec.  However, 'time tcpdump ...' tends to back
up this assertion:

[EMAIL PROTECTED] % sudo time tcpdump -i bge1 -n -w /dev/null -c 1000000
tcpdump: WARNING: bge1: no IPv4 address assigned
tcpdump: listening on bge1, link-type EN10MB (Ethernet), capture size 96
bytes
1000000 packets captured
1000395 packets received by filter
167 packets dropped by kernel
0.268u 0.153s 0:06.84 5.9%      901+3236k 0+0io 0pf+0w


What I'm aiming for, of course, is zero packet loss.  Realizing that's
probably impossible for this system given its load, I'm trying to do
what I can to minimize loss.

The system is running a somewhat leaner kernel than GENERIC.  Notable
changes include:

        * PREEMPTION disabled - /sys/conf/NOTES says this helps with
          interactivity.  I don't care about interactive performance
          on this host.

        * COMPAT_FREEBSD4, COMPAT_LINUX32, and COMPAT_43 are removed.
          They appear to be unneeded.

        * SMP is enabled, as this is a dual-core box (not HTT!).

        * Many devices are removed, e.g., ncr(4), sym(4), adv(4), and
          other unnecessary block devices; anything relating to cardbus;
          de(4), bce(4), ti(4), wb(4), ed(4), ex(4), lnc(4), and a
          number of other network devices that aren't going to ever be
          used; etc.

        * All wlan(4) and related drivers are gone.

        * pf(4), pflog(4), and some of the ALTQ stuff has been added in,
          but is not actively used on this host (at the moment).

        * ZERO_COPY_SOCKETS, MAC_BSDEXTENDED, MAC_PARTITION, and MAC
          are enabled.

        * Most importantly, HZ=1000, and DEVICE_POLLING and
          AUTO_EOI_1 are included.  (AUTO_EOI_1 was added because
          /sys/amd64/conf/NOTES says this can save a few microseconds
          on some interrupts.  I'm not worried about suspend/resume, but
          definitely want speed, so it got added.


As mentioned above, this host is running FreeBSD/amd64, so there's no
need to remove support for I586_CPU, et al; that stuff was never there
in the first place.

Since kern.polling.enable is marked as deprecated in
/sys/kern/kern_poll.c, I'm enabling polling specifically for the
interface receiving the high-volume traffic.  (It is NOT enabled for the
other interface on this system, but traffic loads there are orders of
magnitude lower, so I didn't think it was necessary.)

As mentioned above, I've got HZ set to 1000.  Per /sys/amd64/conf/NOTES,
I'd considered setting it to 2000, but have discovered previously that
FreeBSD's RFC1323 support breaks.  I documented this on -hackers last year:

http://lists.freebsd.org/pipermail/freebsd-hackers/2005-December/014829.html


Since I've not seen word on a correction for this being added to
FreeBSD, I've limited HZ to 1000.

After reading polling(4) a couple times, I set kern.polling.burst_max to
1000.  The manpage says that "each interface can receive at most (HZ *
burst_max) packets per second", and the default setting is 150, which is
described as "adequate for 100Mbit network and HZ=1000."  I figured,
"Hey, gigabit, how about ten times the default?" but that's prevented by
"#define MAX_POLL_BURST_MAX 1000" in /sys/kern/kern_poll.c.

In theory that might've been good enough, but polling(4) says that
kern.polling.burst is "[the] [m]aximum number of packets grabbed from
each network interface in each timer tick.  This number is dynamically
adjusted by the kernel, according to the programmed user_frac,
burst_max, CPU speed, and system load."  I keep seeing
kern.polling.burst hit a thousand, which leads me to believe that
kern.polling.burst_max needs to be higher.

For example:

        secs since
          epoch       kern.polling.burst
        ----------    ------------------
        1166133997       1000
        1166134006        550
        1166134015        877
        1166134024       1000
        1166134033       1000
        1166134042       1000
        1166134051       1000
        1166134060       1000
        1166134069       1000
        1166134078       1000


Unfortunately, that appears to be only possible through a) patching
/sys/kern/kern_poll.c to allow larger values; or b) setting HZ to 2000,
as indicated in one of the NOTES, which will effectively hose certain
TCP connectivity because of the RFC1323 breakage.  Looked at another
way, both essentially require changes to source code, the former being
fairly obvious, and the latter requiring fixes to the RFC1323 support.
Either way, I think that's a bit beyond my abilities; I have NO
illusions about my kernel h4cking sk1llz.

Other possibly relevant data points:

        * System load hovers right around 1.

        * The system has almost zero disk activity.

        * With polling off:

          - 'vmstat 5' consistently shows about 13K context switches
            and ~6800 interrupts
          - 'vmstat -i' shows 2K interrupts per CPU, consistently 6286
            for bge1, and near zero for everything else
          - CPU load drops to 0.4-0.8, but CPU idle time sits around 80%

        * With polling on, kern.polling.burst_max=150:

          - kern.polling.burst holds at 150
          - 'vmstat 5' shows context switches hold around 2600, with
            interrupts holding around 30K
          - 'vmstat -i' shows bge1 interrupt rate of 6286 (but total
            doesn't increase!), other rates stay the same (looks like
            possible display bugs in 'vmstat -i' here!)
          - CPU load holds at 1, but CPU idle time usually stays >95%

        * With polling on, kern.polling.burst_max=1000:

          - kern.polling.burst is frequently 1000 and almost always >850
          - 'vmstat 5' shows context switches unchanged, but interrupts
            are 150K-190K
          - 'vmstat -i' unchanged from burst_max=150
          - CPU load and CPU idle time very similar to burst_max=150


So, with all that in mind.....  Any ideas for improvement?  Apologies in
advance for missing the obvious.  'dmesg' and kernel config are attached.


-- 
Alan Amesbury
OIT Security and Assurance
University of Minnesota
machine         amd64
cpu             HAMMER
ident           SPECIALIZED

# To statically compile in device wiring instead of /boot/device.hints
#hints          "GENERIC.hints"         # Default places to look for devices.

makeoptions     DEBUG=-g                # Build kernel with gdb(1) debug symbols

#options        SCHED_ULE               # ULE scheduler
options         SCHED_4BSD              # 4BSD scheduler
#options        PREEMPTION              # Enable kernel thread preemption
options         INET                    # InterNETworking
options         INET6                   # IPv6 communications protocols
options         FFS                     # Berkeley Fast Filesystem
options         SOFTUPDATES             # Enable FFS soft updates support
options         UFS_ACL                 # Support for access control lists
options         UFS_DIRHASH             # Improve performance on big directories
options         MD_ROOT                 # MD is a potential root device
options         NFSCLIENT               # Network Filesystem Client
options         NFSSERVER               # Network Filesystem Server
options         NFS_ROOT                # NFS usable as /, requires NFSCLIENT
options         MSDOSFS                 # MSDOS Filesystem
options         CD9660                  # ISO 9660 Filesystem
options         PROCFS                  # Process filesystem (requires PSEUDOFS)
options         PSEUDOFS                # Pseudo-filesystem framework
options         GEOM_GPT                # GUID Partition Tables.
options         COMPAT_IA32             # Compatible with i386 binaries
options         COMPAT_FREEBSD5         # Compatible with FreeBSD5
options         SCSI_DELAY=5000         # Delay (in ms) before probing SCSI
options         KTRACE                  # ktrace(1) support
options         SYSVSHM                 # SYSV-style shared memory
options         SYSVMSG                 # SYSV-style message queues
options         SYSVSEM                 # SYSV-style semaphores
options         _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time 
extensions
options         KBD_INSTALL_CDEV        # install a CDEV entry in /dev
options         AHC_REG_PRETTY_PRINT    # Print register bitfields in debug
                                        # output.  Adds ~128k to driver.
options         AHD_REG_PRETTY_PRINT    # Print register bitfields in debug
                                        # output.  Adds ~215k to driver.
options         ADAPTIVE_GIANT          # Giant mutex is adaptive.

options         SMP                     # Symmetric MultiProcessor Kernel

# Workarounds for some known-to-be-broken chipsets (nVidia nForce3-Pro150)
device          atpic                   # 8259A compatability

# Bus support.
device          acpi
device          isa
device          pci
device          mem
device          io

# Floppy drives
device          fdc

# ATA and ATAPI devices
device          ata
device          atadisk         # ATA disk drives
device          ataraid         # ATA RAID drives
device          atapicd         # ATAPI CDROM drives
device          atapifd         # ATAPI floppy drives
device          atapist         # ATAPI tape drives
options         ATA_STATIC_ID   # Static device numbering

# SCSI Controllers
device          ahc             # AHA2940 and onboard AIC7xxx devices
device          ahd             # AHA39320/29320 and onboard AIC79xx devices
device          amd             # AMD 53C974 (Tekram DC-390(T))
device          isp             # Qlogic family
device          mpt             # LSI-Logic MPT-Fusion

# SCSI peripherals
device          scbus           # SCSI bus (required for SCSI)
device          ch              # SCSI media changers
device          da              # Direct Access (disks)
device          sa              # Sequential Access (tape etc)
device          cd              # CD
device          pass            # Passthrough device (direct SCSI access)
device          ses             # SCSI Environmental Services (and SAF-TE)

# RAID controllers interfaced to the SCSI subsystem
device          amr             # AMI MegaRAID
device          ciss            # Compaq Smart RAID 5*
device          dpt             # DPT Smartcache III, IV - See NOTES for options
device          hptmv           # Highpoint RocketRAID 182x
device          iir             # Intel Integrated RAID
device          ips             # IBM (Adaptec) ServeRAID
device          mly             # Mylex AcceleRAID/eXtremeRAID
device          twa             # 3ware 9000 series PATA/SATA RAID

# RAID controllers
device          aac             # Adaptec FSA RAID
device          aacp            # SCSI passthrough for aac (requires CAM)
device          ida             # Compaq Smart RAID
device          twe             # 3ware ATA RAID

# atkbdc0 controls both the keyboard and the PS/2 mouse
device          atkbdc          # AT keyboard controller
device          atkbd           # AT keyboard
device          psm             # PS/2 mouse

device          vga             # VGA video card driver

device          splash          # Splash screen and screen saver support

# syscons is the default console driver, resembling an SCO console
device          sc

device          agp             # support several AGP chipsets

# Serial (COM) ports
device          sio             # 8250, 16[45]50 based serial ports

# If you've got a "dumb" serial or parallel PCI card that is
# supported by the puc(4) glue driver, uncomment the following
# line to enable it (connects to the sio and/or ppc drivers):
#device         puc

# PCI Ethernet NICs.
device          em              # Intel PRO/1000 adapter Gigabit Ethernet Card
device          ixgb            # Intel PRO/10GbE Ethernet Card
device          txp             # 3Com 3cR990 (``Typhoon'')
device          vx              # 3Com 3c590, 3c595 (``Vortex'')

# PCI Ethernet NICs that use the common MII bus controller code.
# NOTE: Be sure to keep the 'device miibus' line in order to use these NICs!
device          miibus          # MII bus support
device          bfe             # Broadcom BCM440x 10/100 Ethernet
device          bge             # Broadcom BCM570xx Gigabit Ethernet
device          dc              # DEC/Intel 21143 and various workalikes
device          fxp             # Intel EtherExpress PRO/100B (82557, 82558)
device          lge             # Level 1 LXT1001 gigabit Ethernet
device          nge             # NatSemi DP83820 gigabit Ethernet
device          re              # RealTek 8139C+/8169/8169S/8110S
device          rl              # RealTek 8129/8139
device          sis             # Silicon Integrated Systems SiS 900/SiS 7016
device          sk              # SysKonnect SK-984x & SK-982x gigabit Ethernet
device          tx              # SMC EtherPower II (83c170 ``EPIC'')
device          xl              # 3Com 3c90x (``Boomerang'', ``Cyclone'')


# Pseudo devices.
device          loop            # Network loopback
device          random          # Entropy device
device          ether           # Ethernet support
device          tun             # Packet tunnel.
device          pty             # Pseudo-ttys (telnet etc)
device          md              # Memory "disks"
device          gif             # IPv6 and IPv4 tunneling
device          faith           # IPv6-to-IPv4 relaying (translation)

# The `bpf' device enables the Berkeley Packet Filter.
# Be aware of the administrative consequences of enabling this!
# Note that 'bpf' is required for DHCP.
device          bpf             # Berkeley packet filter

# USB support
device          uhci            # UHCI PCI->USB interface
device          ohci            # OHCI PCI->USB interface
device          ehci            # EHCI PCI->USB interface (USB 2.0)
device          usb             # USB Bus (required)
#device         udbp            # USB Double Bulk Pipe devices
device          ugen            # Generic
device          uhid            # "Human Interface Devices"
device          ukbd            # Keyboard
device          ulpt            # Printer
device          umass           # Disks/Mass storage - Requires scbus and da
device          ums             # Mouse

# FireWire support
device          firewire        # FireWire bus code
device          sbp             # SCSI over FireWire (Requires scbus and da)
device          fwe             # Ethernet over FireWire (non-standard!)




options         ALTQ
options         ALTQ_CBQ
options         ALTQ_HFSC
options         ALTQ_PRIQ
options         ALTQ_NOPCC
device          pf
device          pflog
options         BRIDGE
options         ZERO_COPY_SOCKETS
options         MAC
options         MAC_BSDEXTENDED
options         MAC_PARTITION
options         HZ=1000
options         SC_HISTORY_SIZE=1000
options         SC_KERNEL_CONS_ATTR=(FG_YELLOW|BG_BLACK)
options         SC_KERNEL_CONS_REV_ATTR=(FG_BLACK|BG_RED)
options         DEVICE_POLLING
options         AUTO_EOI_1
options         INCLUDE_CONFIG_FILE
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 6.1-RELEASE-p10 #1: Thu Oct 12 14:14:54 CDT 2006
    [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SPECIALIZED
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) D CPU 2.80GHz (2800.11-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0xf44  Stepping = 4
  
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x641d<SSE3,RSVD2,MON,DS_CPL,CNTX-ID,CX16,<b14>>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  Cores per package: 2
real memory  = 4563402752 (4352 MB)
avail memory = 4140404736 (3948 MB)
ACPI APIC Table: <DELL   PE850   >
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
Security policy loaded: TrustedBSD MAC/BSD Extended (mac_bsdextended)
Security policy loaded: TrustedBSD MAC/Partition (mac_partition)
ioapic0: Changing APIC ID to 2
ioapic1: Changing APIC ID to 3
ioapic1: WARNING: intbase 32 != expected base 24
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 32-55 on motherboard
acpi0: <DELL PE850> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 28.0 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> at device 0.0 on pci2
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 28.4 on pci0
pci4: <ACPI PCI bus> on pcib4
bge0: <Broadcom BCM5721 Gigabit Ethernet, ASIC rev. 0x4101> mem 
0xfe8f0000-0xfe8fffff irq 16 at device 0.0 on pci4
miibus0: <MII bus> on bge0
brgphy0: <BCM5750 10/100/1000baseTX PHY> on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 
1000baseTX-FDX, auto
bge0: Ethernet address: 00:15:c5:60:1b:dc
pcib5: <ACPI PCI-PCI bridge> at device 28.5 on pci0
pci5: <ACPI PCI bus> on pcib5
bge1: <Broadcom BCM5721 Gigabit Ethernet, ASIC rev. 0x4101> mem 
0xfe6f0000-0xfe6fffff irq 17 at device 0.0 on pci5
miibus1: <MII bus> on bge1
brgphy1: <BCM5750 10/100/1000baseTX PHY> on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 
1000baseTX-FDX, auto
bge1: Ethernet address: 00:15:c5:60:1b:dd
uhci0: <UHCI (generic) USB controller> port 0xbce0-0xbcff irq 20 at device 29.0 
on pci0
uhci0: [GIANT-LOCKED]
usb0: <UHCI (generic) USB controller> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <UHCI (generic) USB controller> port 0xbcc0-0xbcdf irq 21 at device 29.1 
on pci0
uhci1: [GIANT-LOCKED]
usb1: <UHCI (generic) USB controller> on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <UHCI (generic) USB controller> port 0xbca0-0xbcbf irq 22 at device 29.2 
on pci0
uhci2: [GIANT-LOCKED]
usb2: <UHCI (generic) USB controller> on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
ehci0: <Intel 82801GB/R (ICH7) USB 2.0 controller> mem 0xfeb00400-0xfeb007ff 
irq 20 at device 29.7 on pci0
ehci0: [GIANT-LOCKED]
usb3: EHCI version 1.0
usb3: wrong number of companions (7 != 3)
usb3: companion controllers, 2 ports each: usb0 usb1 usb2
usb3: <Intel 82801GB/R (ICH7) USB 2.0 controller> on ehci0
usb3: USB revision 2.0
uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub3: 6 ports with 6 removable, self powered
pcib6: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci6: <ACPI PCI bus> on pcib6
pci6: <display, VGA> at device 5.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel ICH7 UDMA100 controller> port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 31.1 on pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
atapci1: <Intel ICH7 SATA300 controller> port 
0xbc98-0xbc9f,0xbc90-0xbc93,0xbc80-0xbc87,0xbc78-0xbc7b,0xbc60-0xbc6f mem 
0xfeb00000-0xfeb003ff irq 20 at device 31.2 on pci0
ata2: <ATA channel 0> on atapci1
ata3: <ATA channel 1> on atapci1
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A, console
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xec000-0xeffff on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x100>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
acd0: CDRW <TSSTcorpCD-RW/DVD-ROM TSL462C/DE05> at ata0-master UDMA33
ad4: 152587MB <WDC WD1600JS-75NCB2 10.02E03> at ata2-master SATA150
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/ad4s1a
bge0: link state changed to UP
bge1: link state changed to UP
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to