Hi people,
The last couple of days I have been wrestling with a problem that to me
now looks like some kind of OpenBSD issue.
Essentially, we have a PCI Express Digi serial board that I'm sending a
few bytes out on one port with a serial loop back plug connected, and
I'm reading the data back in the same port. I repeat this as quickly as
possible, the idea being to test the serial port and make sure it's
reliable (which as it turns out, it isn't). Most of the time I get back
what I write out, but sometimes a character or possibly two are dropped
and at the same time dmesg (or /var/log/messages) shows non-zero silo
overflow entries.
Eg.
Mar 10 15:42:33 bsd /bsd: com4: 1 silo overflow, 0 ibuf overflows
Mar 10 15:50:47 bsd /bsd: com4: 3 silo overflows, 0 ibuf overflows
Mar 10 15:53:27 bsd /bsd: com4: 1 silo overflow, 0 ibuf overflows
Mar 10 16:08:27 bsd /bsd: com4: 1 silo overflow, 0 ibuf overflows
The characters are dropped seemingly without pattern, but the
interesting thing is that if I generate some traffic over the Ethernet
link (fxp0 in this case), then I see a lot more instances of dropped
characters on the Digi serial port. This suggests to me there is some
kind of conflict between the Digi serial board and at least the Ethernet
card. It's usually one character that gets dropped but sometimes two.
If I run up the test on more than one of the Digi ports at a time that
is also a good way to get dropped characters.
We have tried moving the Digi board to different slots in the machine as
well as different combinations of other Ethernet cards, but the problem
persists. We have also tried a different but identical Digi card
without any improvement.
The motherboard based serial port does not drop any characters, just the
Digi.
We have not tried the Digi boards on other operating systems, nor other
releases on OpenBSD.
While it's not the end of the world if the odd character does goes
missing on a serial port - our software will actually detect and recover
- it is certainly not desirable behaviour.
If anyone has ideas for things to try I will certainly give them a go.
We did try OpenBSD 5.4 on another machine (with different motherboard),
with Digi card, and it also had the dropped characters issue.
To me it looks like the serial data is going out and coming in OK (is
not corrupt on the serial line for example), but is not making it
through at the software level. I'm wondering if it's some kind of
APIC/MSI/interrupt 'thing', but I don't know enough about OpenBSD to
know for sure. Mere speculation.
I wrote a simplified, much cut down version of the program that can be
used to demonstrate the problem, below.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <termios.h>
#include <time.h>
#include <sys/file.h>
#define BUFSIZE 40
int main()
{
struct termios t_ideal;
char buf[BUFSIZE];
time_t this_second = time(NULL);
int serial_fd;
int bytes_in;
int i;
int tts = 0; /* Total This Second */
for (i = 0; i < BUFSIZE; i++)
buf[i] = 'A';
/* Open port on Digi board */
if ((serial_fd = open("/dev/cua04", O_RDWR|O_NDELAY, 0)) < 0)
exit(-1);
if (tcgetattr(serial_fd, &t_ideal))
exit(-1);
/* Set 115200bps, though problem shows up at lower speeds too */
if (cfsetospeed(&t_ideal, B115200))
exit(-1);
if (cfsetispeed(&t_ideal, B115200))
exit(-1);
/* H/W flow control */
t_ideal.c_iflag &= ~(IXON|IXOFF);
t_ideal.c_cflag |= CRTSCTS;
t_ideal.c_lflag &= ~(ICANON|ECHO|ECHOE|ISIG|IEXTEN);
/* Non-blocking input - will return with zero bytes if nothing there
to read */
t_ideal.c_cc[VMIN] = 0;
t_ideal.c_cc[VTIME] = 0;
if (tcsetattr(serial_fd, TCSANOW, &t_ideal))
exit(-1);
while (1)
{
int total_in = 0;
time_t start_time;
/* Write out the entire buffer */
if (write(serial_fd, buf, BUFSIZE) < BUFSIZE)
exit(-1);
start_time = time(NULL);
/* Wait while reading in the entire buffer */
while ((bytes_in = read(serial_fd, buf, BUFSIZE)) >= 0)
{
total_in += bytes_in;
/* Read in the entire buffer? That's good! */
if (total_in == BUFSIZE)
break;
/* Oh-oh */
if (time(NULL) > start_time+1)
{
printf("Timed out; only received %d bytes! Expected %d\n",
total_in, BUFSIZE);
break;
}
}
if (bytes_in < 0)
exit(-1);
/* Show rate for informational purposes and as heartbeat */
if (time(NULL) > this_second)
{
printf("%d bytes this second\n", tts);
tts = 0;
this_second = time(NULL);
}
tts += total_in;
}
}
Output, demonstrating the problem (I was performing an rsync on the
Ethernet port at the time):
11360 bytes this second
11360 bytes this second
11320 bytes this second
Timed out; only received 38 bytes! Expected 40
920 bytes this second
11358 bytes this second
Timed out; only received 38 bytes! Expected 40
3560 bytes this second
11358 bytes this second
11360 bytes this second
Output from dmesg from this machine (note silo overflows at end):
OpenBSD 5.4 (GENERIC.MP) #41: Tue Jul 30 15:30:02 MDT 2013
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17121226752 (16328MB)
avail mem = 16657731584 (15886MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.5 @ 0x9f6a4000 (64 entries)
bios0: vendor Intel Corp. version "S3420GP.86B.01.00.0046.092920101143"
date 09/29/2010
bios0: Intel Corporation S3420GP
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S5
acpi0: tables DSDT FACP APIC MCFG HPET SLIT SPCR WDDT SSDT SSDT HEST
BERT ERST EINJ
acpi0: wakeup devices MRP1(S5) MRP2(S5) MRP3(S4) MRP4(S4) ILAN(S5)
EHC2(S5) PEX0(S5) PEX1(S5) PEX2(S5) PEX3(S5) PEX4(S5) PEX6(S5) PEX7(S5)
EHC1(S5) IP2P(S5) SLPB(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU X3470 @ 2.93GHz, 2933.71 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
cpu0: apic clock running at 133MHz
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Xeon(R) CPU X3470 @ 2.93GHz, 2933.29 MHz
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Xeon(R) CPU X3470 @ 2.93GHz, 2933.29 MHz
cpu2:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Xeon(R) CPU X3470 @ 2.93GHz, 2933.29 MHz
cpu3:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 0, core 3, package 0
cpu4 at mainbus0: apid 1 (application processor)
cpu4: Intel(R) Xeon(R) CPU X3470 @ 2.93GHz, 2933.29 MHz
cpu4:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC
cpu4: 256KB 64b/line 8-way L2 cache
cpu4: smt 1, core 0, package 0
cpu5 at mainbus0: apid 3 (application processor)
cpu5: Intel(R) Xeon(R) CPU X3470 @ 2.93GHz, 2933.29 MHz
cpu5:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC
cpu5: 256KB 64b/line 8-way L2 cache
cpu5: smt 1, core 1, package 0
cpu6 at mainbus0: apid 5 (application processor)
cpu6: Intel(R) Xeon(R) CPU X3470 @ 2.93GHz, 2933.29 MHz
cpu6:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC
cpu6: 256KB 64b/line 8-way L2 cache
cpu6: smt 1, core 2, package 0
cpu7 at mainbus0: apid 7 (application processor)
cpu7: Intel(R) Xeon(R) CPU X3470 @ 2.93GHz, 2933.29 MHz
cpu7:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC
cpu7: 256KB 64b/line 8-way L2 cache
cpu7: smt 1, core 3, package 0
ioapic0 at mainbus0: apid 8 pa 0xfec00000, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xa0000000, bus 0-255
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (MRP1)
acpiprt2 at acpi0: bus -1 (MRP3)
acpiprt3 at acpi0: bus 1 (PEX0)
acpiprt4 at acpi0: bus 3 (PEX4)
acpiprt5 at acpi0: bus 4 (PEX6)
acpiprt6 at acpi0: bus 6 (IP2P)
acpicpu0 at acpi0: C3, C1, PSS
acpicpu1 at acpi0: C3, C1, PSS
acpicpu2 at acpi0: C3, C1, PSS
acpicpu3 at acpi0: C3, C1, PSS
acpicpu4 at acpi0: C3, C1, PSS
acpicpu5 at acpi0: C3, C1, PSS
acpicpu6 at acpi0: C3, C1, PSS
acpicpu7 at acpi0: C3, C1, PSS
acpibtn0 at acpi0: SLPB
ipmi at mainbus0 not configured
cpu0: Enhanced SpeedStep 2933 MHz: speeds: 2927, 2926, 2793, 2660, 2527,
2394, 2261, 2128, 1995, 1862, 1729, 1596, 1463, 1330, 1197 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core DMI" rev 0x11
"Intel Core Management" rev 0x11 at pci0 dev 8 function 0 not configured
"Intel Core Scratch" rev 0x11 at pci0 dev 8 function 1 not configured
"Intel Core Control" rev 0x11 at pci0 dev 8 function 2 not configured
"Intel Core Misc" rev 0x11 at pci0 dev 8 function 3 not configured
"Intel Core QPI Link" rev 0x11 at pci0 dev 16 function 0 not configured
"Intel Core QPI Routing" rev 0x11 at pci0 dev 16 function 1 not configured
em0 at pci0 dev 25 function 0 "Intel 82578DM" rev 0x05: msi, address
00:1e:67:06:73:d9
ehci0 at pci0 dev 26 function 0 "Intel 3400 USB" rev 0x05: apic 8 int 21
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb0 at pci0 dev 28 function 0 "Intel 3400 PCIE" rev 0x05: msi
pci1 at ppb0 bus 1
ppb1 at pci1 dev 0 function 0 "PLX PEX 8112" rev 0xaa
pci2 at ppb1 bus 2
puc0 at pci2 dev 0 function 0 "Digi Neo-8" rev 0x02: ports: 8 com
com4 at puc0 port 0 apic 8 int 16: ns16550a, 16 byte fifo
com4: probed fifo depth: 0 bytes
com5 at puc0 port 1 apic 8 int 16: ns16550a, 16 byte fifo
com5: probed fifo depth: 0 bytes
com6 at puc0 port 2 apic 8 int 16: ns16550a, 16 byte fifo
com6: probed fifo depth: 0 bytes
com7 at puc0 port 3 apic 8 int 16: ns16550a, 16 byte fifo
com7: probed fifo depth: 0 bytes
com8 at puc0 port 4 apic 8 int 16: ns16550a, 16 byte fifo
com8: probed fifo depth: 0 bytes
com9 at puc0 port 5 apic 8 int 16: ns16550a, 16 byte fifo
com9: probed fifo depth: 0 bytes
com10 at puc0 port 6 apic 8 int 16: ns16550a, 16 byte fifo
com10: probed fifo depth: 0 bytes
com11 at puc0 port 7 apic 8 int 16: ns16550a, 16 byte fifo
com11: probed fifo depth: 0 bytes
ppb2 at pci0 dev 28 function 4 "Intel 3400 PCIE" rev 0x05
pci3 at ppb2 bus 3
em1 at pci3 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address
00:1e:67:06:73:d8
ppb3 at pci0 dev 28 function 6 "Intel 3400 PCIE" rev 0x05
pci4 at ppb3 bus 4
vga1 at pci4 dev 0 function 0 "Matrox MGA G200e" rev 0x02
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
ppb4 at pci0 dev 28 function 7 "Intel 3400 PCIE" rev 0x05: msi
pci5 at ppb4 bus 5
ehci1 at pci0 dev 29 function 0 "Intel 3400 USB" rev 0x05: apic 8 int 23
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb5 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0xa5
pci6 at ppb5 bus 6
fxp0 at pci6 dev 0 function 0 "Intel 8255x" rev 0x10, i82551: apic 8 int
16, address 00:0e:0c:6f:98:4e
inphy0 at fxp0 phy 1: i82555 10/100 PHY, rev. 4
pcib0 at pci0 dev 31 function 0 "Intel 3420 LPC" rev 0x05
pciide0 at pci0 dev 31 function 2 "Intel 3400 SATA" rev 0x05: DMA,
channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide0: using apic 8 int 18 for native-PCI interrupt
wd0 at pciide0 channel 0 drive 0: <WDC WD2003FZEX-00Z4SA0>
wd0: 16-sector PIO, LBA48, 1907729MB, 3907029168 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 6
wd1 at pciide0 channel 1 drive 0: <WDC WD2003FZEX-00Z4SA0>
wd1: 16-sector PIO, LBA48, 1907729MB, 3907029168 sectors
wd1(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 6
ichiic0 at pci0 dev 31 function 3 "Intel 3400 SMBus" rev 0x05: apic 8 int 18
iic0 at ichiic0
iic0: addr 0x18 00=00 01=00 02=00 03=00 04=04 05=41 06=1b 07=0a 08=00
09=00 0a=00 0b=00 0c=00 0d=00 0e=00 0f=00 words 00=007f 01=000c 02=0000
03=0000 04=04c4 05=41d6 06=1b09 07=0a00
iic0: addr 0x19 00=00 01=00 02=00 03=00 04=04 05=41 06=1b 07=0a 08=00
09=00 0a=00 0b=00 0c=00 0d=00 0e=00 0f=00 words 00=007f 01=000c 02=0000
03=0000 04=04c8 05=41db 06=1b09 07=0a00
iic0: addr 0x1a 00=00 01=00 02=00 03=00 04=04 05=41 06=1b 07=0a 08=00
09=00 0a=00 0b=00 0c=00 0d=00 0e=00 0f=00 words 00=007f 01=000c 02=0000
03=0000 04=04c4 05=41a9 06=1b09 07=0a00
iic0: addr 0x1b 00=00 01=00 02=00 03=00 04=04 05=41 06=1b 07=0a 08=00
09=00 0a=00 0b=00 0c=00 0d=00 0e=00 0f=00 words 00=007f 01=000c 02=0000
03=0000 04=04c8 05=41bd 06=1b09 07=0a00
spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM registered ECC PC3-10600 with
thermal sensor
spdmem1 at iic0 addr 0x51: 4GB DDR3 SDRAM registered ECC PC3-10600 with
thermal sensor
spdmem2 at iic0 addr 0x52: 4GB DDR3 SDRAM registered ECC PC3-10600 with
thermal sensor
spdmem3 at iic0 addr 0x53: 4GB DDR3 SDRAM registered ECC PC3-10600 with
thermal sensor
pciide1 at pci0 dev 31 function 5 "Intel 3400 SATA" rev 0x05: DMA,
channel 0 wired to native-PCI, channel 1 wired to native-PCI
pciide1: using apic 8 int 22 for native-PCI interrupt
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
kbc: cmd word write error
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
mtrr: Pentium Pro MTRR support
uhub2 at uhub0 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
uhidev0 at uhub2 port 4 configuration 1 interface 0 "NOVATEK USB
Keyboard" rev 1.10/1.12 addr 3
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd0 at ukbd0 mux 1
wskbd0: connecting to wsdisplay0
uhidev1 at uhub2 port 4 configuration 1 interface 1 "NOVATEK USB
Keyboard" rev 1.10/1.12 addr 3
uhidev1: iclass 3/0, 4 report ids
uhid0 at uhidev1 reportid 2: input=1, output=0, feature=0
uhid1 at uhidev1 reportid 3: input=3, output=0, feature=0
uhid2 at uhidev1 reportid 4: input=2, output=0, feature=0
uhub3 at uhub1 port 1 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2
uhidev2 at uhub3 port 2 configuration 1 interface 0 "American Megatrends
Inc. Virtual Keyboard and Mouse" rev 1.10/1.00 addr 3
uhidev2: iclass 3/1
ukbd1 at uhidev2: 8 variable keys, 6 key codes
wskbd1 at ukbd1 mux 1
wskbd1: connecting to wsdisplay0
uhidev3 at uhub3 port 2 configuration 1 interface 1 "American Megatrends
Inc. Virtual Keyboard and Mouse" rev 1.10/1.00 addr 3
uhidev3: iclass 3/1
ums0 at uhidev3: 3 buttons, Z dir
wsmouse0 at ums0 mux 0
uftdi0 at uhub3 port 4 "Crystalfontz Crystalfontz CFA631-USB LCD" rev
2.00/6.00 addr 4
ucom0 at uftdi0 portno 1
vscsi0 at root
scsibus0 at vscsi0: 256 targets
softraid0 at root
scsibus1 at softraid0: 256 targets
root on wd0a (ff0043e89f189b7d.a) swap on wd0b dump on wd0b
com4: 1 silo overflow, 0 ibuf overflows
com4: 2 silo overflows, 0 ibuf overflows
com4: 6 silo overflows, 0 ibuf overflows
com4: 3 silo overflows, 0 ibuf overflows
com4: 1 silo overflow, 0 ibuf overflows
com4: 1 silo overflow, 0 ibuf overflows
com4: 1 silo overflow, 0 ibuf overflows
com4: 2 silo overflows, 0 ibuf overflows
com4: 1 silo overflow, 0 ibuf overflows
com4: 1 silo overflow, 0 ibuf overflows
com4: 3 silo overflows, 0 ibuf overflows
com4: 1 silo overflow, 0 ibuf overflows
com4: 1 silo overflow, 0 ibuf overflows
Much searching of the Net did not present any solution or any others
having this problem. Surely we're not alone? I didn't see any mention
of errata for 5.4 that was anything like this. I'm now at a loss.
Ideas are welcome...
-Martin
--
R A Ward Ltd. | We take the privacy of our customers seriously.
Christchurch | All sensitive E-Mail attachments MUST be encrypted.
New Zealand