Re: is this rge crash known? - fixed

2022-12-19 Thread Geoff Steckel

On 12/19/22 21:07, Kevin Lo wrote:

On Mon, Dec 19, 2022 at 03:50:45PM -0500, Geoff Steckel wrote:

Thanks for all the suggestions:

sysctl kern.pool_debug=1 = no change
known working board in same slot = no change

hardware version is indeed 0609
em(4) in same slot = works
test using old rge(4) board between two Linux systems = works

Are any other drivers similar enough for me to compare with if_rge.c?
Perhaps the AMD 5600G or the B550 chipset have quirks not seen before?

I could possibly install FreeBSD if that would give any information.

The diff below syncs up the Rx descriptor setup code to the upstream.
It should fix the problem.

Index: sys/dev/pci/if_rge.c
===
RCS file: /cvs/src/sys/dev/pci/if_rge.c,v
retrieving revision 1.20
diff -u -p -u -p -r1.20 if_rge.c
--- sys/dev/pci/if_rge.c20 Nov 2022 23:47:51 -  1.20
+++ sys/dev/pci/if_rge.c20 Dec 2022 01:54:30 -
@@ -1104,24 +1104,16 @@ rge_newbuf(struct rge_queues *q)
/* Map the segments into RX descriptors. */
r = >q_rx.rge_rx_list[idx];
  
-	if (RGE_OWN(r)) {

-   printf("%s: tried to map busy RX descriptor\n",
-   sc->sc_dev.dv_xname);
-   m_freem(m);
-   return (ENOBUFS);
-   }
-
rxq->rxq_mbuf = m;
  
-	r->rge_extsts = 0;

-   r->rge_addrlo = htole32(RGE_ADDR_LO(rxmap->dm_segs[0].ds_addr));
-   r->rge_addrhi = htole32(RGE_ADDR_HI(rxmap->dm_segs[0].ds_addr));
+   r->hi_qword1.rx_qword4.rge_extsts = 0;
+   r->hi_qword0.rge_addr = htole64(rxmap->dm_segs[0].ds_addr);
  
-	r->rge_cmdsts = htole32(rxmap->dm_segs[0].ds_len);

+   r->hi_qword1.rx_qword4.rge_cmdsts = htole32(rxmap->dm_segs[0].ds_len);
if (idx == RGE_RX_LIST_CNT - 1)
-   r->rge_cmdsts |= htole32(RGE_RDCMDSTS_EOR);
+   r->hi_qword1.rx_qword4.rge_cmdsts |= htole32(RGE_RDCMDSTS_EOR);
  
-	r->rge_cmdsts |= htole32(RGE_RDCMDSTS_OWN);

+   r->hi_qword1.rx_qword4.rge_cmdsts |= htole32(RGE_RDCMDSTS_OWN);
  
  	bus_dmamap_sync(sc->sc_dmat, q->q_rx.rge_rx_list_map,

idx * sizeof(struct rge_rx_desc), sizeof(struct rge_rx_desc),
@@ -1140,11 +1132,11 @@ rge_discard_rxbuf(struct rge_queues *q,
  
  	r = >q_rx.rge_rx_list[idx];
  
-	r->rge_cmdsts = htole32(RGE_JUMBO_FRAMELEN);

-   r->rge_extsts = 0;
+   r->hi_qword1.rx_qword4.rge_cmdsts = htole32(RGE_JUMBO_FRAMELEN);
+   r->hi_qword1.rx_qword4.rge_extsts = 0;
if (idx == RGE_RX_LIST_CNT - 1)
-   r->rge_cmdsts |= htole32(RGE_RDCMDSTS_EOR);
-   r->rge_cmdsts |= htole32(RGE_RDCMDSTS_OWN);
+   r->hi_qword1.rx_qword4.rge_cmdsts |= htole32(RGE_RDCMDSTS_EOR);
+   r->hi_qword1.rx_qword4.rge_cmdsts |= htole32(RGE_RDCMDSTS_OWN);
  
  	bus_dmamap_sync(sc->sc_dmat, q->q_rx.rge_rx_list_map,

idx * sizeof(struct rge_rx_desc), sizeof(struct rge_rx_desc),
@@ -1219,8 +1211,8 @@ rge_rxeof(struct rge_queues *q)
if (RGE_OWN(cur_rx))
break;
  
-		rxstat = letoh32(cur_rx->rge_cmdsts);

-   extsts = letoh32(cur_rx->rge_extsts);
+   rxstat = letoh32(cur_rx->hi_qword1.rx_qword4.rge_cmdsts);
+   extsts = letoh32(cur_rx->hi_qword1.rx_qword4.rge_extsts);

total_len = RGE_RXBYTES(cur_rx);
rxq = >q_rx.rge_rxq[i];
@@ -1282,16 +1274,16 @@ rge_rxeof(struct rge_queues *q)
(total_len - ETHER_CRC_LEN);
  
  		/* Check IP header checksum. */

-   if (!(rxstat & RGE_RDCMDSTS_IPCSUMERR) &&
+   if (!(extsts & RGE_RDEXTSTS_IPCSUMERR) &&
(extsts & RGE_RDEXTSTS_IPV4))
m->m_pkthdr.csum_flags |= M_IPV4_CSUM_IN_OK;
  
  		/* Check TCP/UDP checksum. */

if ((extsts & (RGE_RDEXTSTS_IPV4 | RGE_RDEXTSTS_IPV6)) &&
-   (((rxstat & RGE_RDCMDSTS_TCPPKT) &&
-   !(rxstat & RGE_RDCMDSTS_TCPCSUMERR)) ||
-   ((rxstat & RGE_RDCMDSTS_UDPPKT) &&
-   !(rxstat & RGE_RDCMDSTS_UDPCSUMERR
+   (((extsts & RGE_RDEXTSTS_TCPPKT) &&
+   !(extsts & RGE_RDEXTSTS_TCPCSUMERR)) ||
+   ((extsts & RGE_RDEXTSTS_UDPPKT) &&
+   !(extsts & RGE_RDEXTSTS_UDPCSUMERR
m->m_pkthdr.csum_flags |= M_TCP_CSUM_IN_OK |
M_UDP_CSUM_IN_OK;
  
Index: sys/dev/pci/if_rgereg.h

===
RCS file: /cvs/src/sys/dev/pci/if_rgereg.h,v
retrieving revision 1.8
diff -u -p -u -p -r1.8 if_rgereg.h
--- sys/dev/pci/if_rgereg.h 20 Nov 2022 23:47:51 -

Re: is this rge crash known? - another test result

2022-12-19 Thread Geoff Steckel

7.2 in an ASRock J4105M - same crash, much faster (8G vs 32G memory?)
Geoff Steckel



Re: is this rge crash known? - test results

2022-12-19 Thread Geoff Steckel

Thanks for all the suggestions:

sysctl kern.pool_debug=1 = no change
known working board in same slot = no change

hardware version is indeed 0609
em(4) in same slot = works
test using old rge(4) board between two Linux systems = works

Are any other drivers similar enough for me to compare with if_rge.c?
Perhaps the AMD 5600G or the B550 chipset have quirks not seen before?

I could possibly install FreeBSD if that would give any information.

thanks again,
Geoff Steckel




is this rge crash known?

2022-12-18 Thread Geoff Steckel

nc of 0's from one rge to another at full speed crashes
in the input interrupt path with corruption of the memory
pool used for the mbufs
It's 100% reproduceable.
Probably race condition & use-after-free or some such
since it takes 200,000+ packets to happen.
I suspect that the crash happens when the corruption is detected
some time after it actually occurs.

This is a ---very--- abbreviated description.
If this crash hasn't been seen before I'll submit a full bug report.

Is there any more info from sysctls, ddb, etc. that would help?
I can put in breakpoints & dump (small) memory areas.
If running the most recent snapshot would give better info I can do that.
A serial console to get an exact transcript is possible but not easy.

Any suggestions of something I can do to help beyond a standard bug
report welcomed. I can run test patches easily.

This is with the standard 1500 mtu.
Setting mtu to 8000 trashes memory enough to cause a kernel protection 
fault.


OpenBSD 7.2 (GENERIC.MP) #758: Tue Sep 27 11:57:54 MDT 2022
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 67997949952 (64847MB)
avail mem = 65919754240 (62865MB)

bios0 at mainbus0: SMBIOS rev. 3.3 @ 0xe6cc0 (33 entries)
bios0: vendor American Megatrends International, LLC. version "P2.10" 
date 08/02/2021

bios0: ASRock B550 Phantom Gaming 4
...
cpu0: AMD Ryzen 5 5600G with Radeon Graphics, 3892.77 MHz, 19-50-00
...
rge0 at pci3 dev 0 function 0 "Realtek RTL8125" rev 0x00: msi, address 
78:2d:7e:12:5a:d6


panic hand copied:
sched_idle
apicpu_idle
Xintr_ioapic_edge22_???
intr_handler
rge_intr
rge_rxeof
rge_newbuf
mclget(0, 2, 2400)
pool_get
pool_cache_get
panic: pool cache item mcl9k cp freelist modified
(panic on test 1)
fd800b076880 + 16 0x64000403b8f3400 != 0x46b8689556980a7
(panic on test 2 - all previous data identical)
fd800b118d40 + 16 0x64000409da03400 != 0x5be8fd0cf17429b

thanks for any input,
Geoff Steckel



Re: allow 240/4 in various network daemons

2022-05-28 Thread Geoff Steckel




On 5/28/22 5:22 PM, Crystal Kolipe wrote:


There certainly are people using this behaviour of the loopback address(es)
in creative ways on non-OpenBSD systems:

https://timkay.com/solo/

Changing it on those systems will likely break various users' scripts in
unexpected ways.

The script linked above doesn't work in it's original form on OpenBSD
precisely because of the different behaviour of the 127/8 subnet.

Not that I really care, because a solution to the _real_ problem already
exists.  It's called IPv6.


True. But I would have to buy access to a VPN because
my service provider hasn't extended IPv6 service to my location.
Hurricane Electric isn't useful because some sites care about geography
and they can't determine where a HE user is located.



ether_output broadcast question

2015-04-25 Thread Geoff Steckel

In if_ethersubr.c @ line 304:

switch (dst-sa_family) {
case AF_INET:
error = arpresolve(ac, rt, m, dst, edst);
if (error)
return (error == EAGAIN ? 0 : error);
/* If broadcasting on a simplex interface, loopback a 
copy */
if ((m-m_flags  M_BCAST)  (ifp-if_flags  
IFF_SIMPLEX) 

!m-m_pkthdr.pf.routed)
mcopy = m_copy(m, 0, (int)M_COPYALL);
etype = htons(ETHERTYPE_IP);
break;
#ifdef INET6
case AF_INET6:
error = nd6_storelladdr(ifp, rt, m, dst, (u_char *)edst);
if (error)
return (error);
etype = htons(ETHERTYPE_IPV6);
break;

@ 374

/* XXX Should we feed-back an unencrypted IPsec packet ? */
if (mcopy)
(void) looutput(ifp, mcopy, dst, rt);

Does the loopback for a simplex interface for IPv6 happen somewhere else?

thanks!
Geoff Steckel



httpd: multiple addresses for one server

2015-01-01 Thread Geoff Steckel

Is there any way todo the equivalent of:

server an.example.com
listen on 192.168.2.99
listen on 2001.fefe.1.1::99

??
It appears that the code in parse.y explicitly forbids this
and the data structures for a server don't *seem*
to have more than one slot for an address.

Is there another way to achieve this effect?
From one comment in the checkins, it looks like

server an.example.com
listen on 192.168.2.99
.
server an.example.com
listen on 2001.fefe.1.1::99

would work.

Duplicating the entire server description is
difficult to maintain.

Is someone planning to work in this area soon?

thanks
Geoff Steckel



traceroute6 crossing rdomains

2014-12-02 Thread Geoff Steckel

My configuration drops ICMP6_TIME_EXCEEDED crossing rdomains.
I can't find a problem with the setup.
If this is my fault, please tell me.

I have an IP6 connection via SIXXS. I put gif0 in its own rdomain
so I could isolate the tunnel endpoint addresses.# outgoing from internals

pf.conf:

pass out quick on lo2 \
inet6 \
to ! $net6 \
rtable 1 \
label lo2out

pass in quick on gif0 \
inet6 \
to valid6 \
rtable 0 \
label ip6in

river:gwes:5720$ netstat -rn -f inet6
Routing tables

Internet6:
Destination  Gateway   FlagsRefs  Use   Mtu  Prio Iface
::/104 ::1UGRS0   0 - 8 lo0
::/96  ::1UGRS0   0 - 8 lo0
default   ::2UGS  739 22876  - 8 lo2
[paths to local hosts omitted]

river:gwes:5724$ netstat -T 1 -rn -f inet6
Routing tables

Internet6:
Destination  Gateway  Flags Refs Use   Mtu  Prio Iface
default  2001:4830:1100:2db::1  UGS   0  22952 -  8 gif0
::1 link#7UHL00 - 4 lo0

ping6 to any external host works
traceroute6 using ICMP6 ECHO works
traceroute6 using UDP returns nothing

I can see the TIME_EXCEEDED packets coming in gif0 using tcpdump
I can't see them after that. They seem to disappear somewhere in
PF leaving no trace.

My first thought is that the outgoing state is marked with rdomain 0.
The returned packet is marked with rdomain 1.
It looks like ECHO packets and TIME_EXCEEDED packets go through different
paths in incoming state matching.
It looks like TIME_EXCEEDED packets can't match because of
the different rdomains and therefore get dropped invisibly.

Comments? Flames? RTFMs?

thanks
Geoff Steckel



Re: Unnecessary mmap flags?

2014-06-26 Thread Geoff Steckel

On 06/26/2014 03:28 PM, Matthew Dempsky wrote:

I just reviewed our mmap(2) flags to compare them against Linux,
FreeBSD, Solaris, and Darwin's flags.  Of the flags listed below, none
of them are specified by POSIX, and none of them do anything
interesting on OpenBSD: MAP_COPY just gets rewritten to MAP_PRIVATE,
and the rest are silently ignored by UVM.

Linux   FreeBSD Solaris Darwin
MAP_COPYno  YES*no  YES*
MAP_RENAME  no  YES*YES*YES*
MAP_NORESERVE   YES YES*YES YES*
MAP_INHERIT no  YES**   no  no
MAP_NOEXTENDno  no  no  YES*
MAP_HASSEMAPHOREno  YES***  no  YES***
MAP_TRYFIXEDno  no  no  no

* These are defined in the OS's sys/mman.h, but are undocumented in
their mmap(2) manual, and behave the same as on OpenBSD (i.e.,
MAP_COPY is an alias for MAP_PRIVATE; the others have no effect).

** MAP_INHERIT is documented in FreeBSD's mmap(2) manual (as This
flag never operated as advertised, which is true on OpenBSD too), but
not defined in their sys/mman.h.

*** MAP_HASSEMAPHORE is documented in FreeBSD and Darwin's mmap(2)
manuals and defined in their sys/mman.h, but has no effects on
either's VM behavior.


So MAP_NORESERVE is perhaps necessary/worth keeping around, but the
others seem like candidates for removal if nothing in ports needs
them.

MAP_HASSEMAPHORE is used in rthread_sem.c, but it doesn't do anything,
so I suspect it's just cargo culting based on man page misinformation?
Are there architectures that actually have restrictions on semaphore
memory?

Sun allowed requests for shared memory to be uncached. This was used
for DMA and (IIRC) interprocessor communication since adding sufficient
memory barriers was painful. I don't remember how to make that request.

Geoff Steckel



Re: em(4): Don't count RX overruns and missed packets as input errros

2014-02-12 Thread Geoff Steckel

On 02/12/2014 05:44 AM, Mike Belopuhov wrote:

On 11 February 2014 20:05, Brad Smith b...@comstyle.com wrote:

On Tue, Feb 11, 2014 at 07:43:51PM +0100, Mark Kettenis wrote:

Date: Tue, 11 Feb 2014 13:30:47 -0500
From: Brad Smith b...@comstyle.com


Index: arch/socppc/dev/if_tsec.c
===
RCS file: /home/cvs/src/sys/arch/socppc/dev/if_tsec.c,v
retrieving revision 1.29
diff -u -p -u -p -r1.29 if_tsec.c
--- arch/socppc/dev/if_tsec.c 29 Nov 2012 21:10:31 -  1.29
+++ arch/socppc/dev/if_tsec.c 28 Jan 2014 05:16:24 -
@@ -779,7 +779,6 @@ tsec_errintr(void *arg)
*/
   tsec_rx_proc(sc);
   tsec_write(sc, TSEC_RSTAT, TSEC_RSTAT_QHLT);
- ifp-if_ierrors++;
   }

   return (1);


This one doesn't seem right.  This is the only place where the driver
actually increases if_ierrors.

Being the only place input errors are incremented is irrelevant.
Its being incremented because the particular error is a FIFO overrun.


I also still fundamentally disagree with the direction.  I you guys
really want to make a distinction between packets dropped because
we're out of descriptors and packets that were not correctly received
for other reasons, add a counter for that first and then change the
drivers.

I don't necessarily disagree with what you have said. I think we should
have some additional counters to deal with some of the counters we
are lumping into error counters.

Since we can't seem to come to any consensus about how to deal with
this I'm going to revert the bge(4) commit in question.


no way.  counting drops caused by the mclgeti should not be accounted
as input errors.  it makes it hard to debug things.


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Locating where and why packets are dropped is essential to debugging network
overload problems. A per-interface packets dropped because of lack of 
resources

would make that process easier. There used to be several places in packet
reception where packets were dropped with no record. I inserted counters
to get at least a global count. It's conceptually very ugly to silently fail
in a vital part of the network stack. It's definitely a frustrating
inconvenience.

Geoff Steckel




Re: help needed from someone with an sk(4)

2014-01-26 Thread Geoff Steckel

On 01/24/2014 02:09 PM, Henning Brauer wrote:

* Ted Unangst t...@tedunangst.com [2014-01-24 17:48]:


Are people still using sk, gem, or hme (!) in pps performance critical
situations?

doesn't make sense to do so, and hasn't in a long time...

Performance critical? Well, given the pathetic speeds available in
the USA for less than a small fortune, sk works well and was cheap.
Yes, this is an ancient release and an ancient/snail slow CPU...
it works and the system uses about 15W with only 1 tiny fan.
At max load (routing 20 mbit and forwarding 200 mbit or so) the
CPU is 25% loaded.

[Why an ancient release?

sysmerge didn't handle any of the crucial update cases (pf changes 
especially)

at all well. It didn't differentiate release-to-release changes from user
changes. Maybe it does now with diff3 - that's what I use.
(old release /etc - current system /etc - new release /etc).
This box will probably go to 5.5 or 5.4 current - not significantly less
work than a single release update.]

OpenBSD 5.2 (GENERIC) #278: Wed Aug  1 10:04:16 MDT 2012
dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: VIA Esther processor 1500MHz (CentaurHauls 686-class) 1.51 GHz
cpu0: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,APIC,SEP,MTRR,PGE,CMOV,PAT,CFLUSH,ACPI,MMX,FXSR,SSE,SSE2,TM,SBF,NXE,SSE3,EST,TM2

real mem  = 1005056000 (958MB)
avail mem = 96640 (932MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 05/16/06, BIOS32 rev. 0 @ 0xfb570, 
SMBIOS rev. 2.3 @ 0xf (34 entries)

bios0: vendor Phoenix Technologies, LTD version 6.00 PG date 05/16/2006
apm0 at bios0: Power Management spec V1.2 (slowidle)
acpi at bios0 function 0x0 not configured
mpbios0 at bios0: Intel MP Specification 1.4
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: RNG AES AES-CTR SHA1 SHA256 RSA
cpu0: apic clock running at 100MHz
mpbios0: bus 0 is type PCI
mpbios0: bus 1 is type PCI
mpbios0: bus 2 is type ISA
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 3, 24 pins
pcibios0 at bios0: rev 2.1 @ 0xf/0xdc84
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfdbb0/208 (11 entries)
pcibios0: bad IRQ table checksum
pcibios0: PCI BIOS has 11 Interrupt Routing table entries
pcibios0: PCI Exclusive IRQs: 5 9 10 11
pcibios0: PCI Interrupt Router at 000:17:0 (VIA VT8237 ISA rev 0x00)
pcibios0: PCI bus #1 is the last bus
bios0: ROM list: 0xc/0xfe00 0xd/0x5000!
cpu0: unknown Enhanced SpeedStep CPU, msr 0x08100f1308000f13
cpu0: using only highest and lowest power states
cpu0: Enhanced SpeedStep 1501 MHz: speeds: 1500, 800 MHz
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 VIA CN700 Host rev 0x00
viaagp0 at pchb0: v3
agp0 at viaagp0: aperture at 0xe800, size 0x1000
pchb1 at pci0 dev 0 function 1 VIA CN700 Host rev 0x00
pchb2 at pci0 dev 0 function 2 VIA CN700 Host rev 0x00
pchb3 at pci0 dev 0 function 3 VIA PT890 Host rev 0x00
pchb4 at pci0 dev 0 function 4 VIA CN700 Host rev 0x00
pchb5 at pci0 dev 0 function 7 VIA CN700 Host rev 0x00
ppb0 at pci0 dev 1 function 0 VIA VT8377 AGP rev 0x00
pci_intr_map: bus 0 dev 1 func 0 pin 1; line 10
pci_intr_map: no MP mapping found
pci_intr_map: bus 0 dev 1 func 0 pin 2; line 11
pci_intr_map: no MP mapping found
pci_intr_map: bus 0 dev 1 func 0 pin 3; line 9
pci_intr_map: no MP mapping found
pci_intr_map: bus 0 dev 1 func 0 pin 4; line 5
pci_intr_map: no MP mapping found
pci1 at ppb0 bus 1
vga1 at pci1 dev 0 function 0 VIA S3 Unichrome PRO IGP rev 0x01
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
skc0 at pci0 dev 8 function 0 D-Link DGE-530T A1 rev 0x11, Yukon 
(0x1): apic 2 int 17

sk0 at skc0 port A: address 00:0d:88:c8:2b:c8
eephy0 at sk0 phy 0: 88E1011 Gigabit PHY, rev. 3
VIA VT6306 FireWire rev 0x80 at pci0 dev 10 function 0 not configured
re0 at pci0 dev 11 function 0 Realtek 8169 rev 0x10: RTL8169/8110SCd 
(0x1800), apic 2 int 19, address 00:30:18:a8:10:76

rgephy0 at re0 phy 7: RTL8169S/8110S PHY, rev. 2
pciide0 at pci0 dev 15 function 0 VIA VT6420 SATA rev 0x80: DMA
pciide0: using apic 2 int 20 for native-PCI interrupt
wd0 at pciide0 channel 0 drive 0: STT_SX16B6XSL
wd0: 1-sector PIO, LBA48, 15272MB, 31277232 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 6
pciide1 at pci0 dev 15 function 1 VIA VT82C571 IDE rev 0x06: ATA133, 
channel 0 configured to compatibility, channel 1 configured to compatibility

atapiscsi0 at pciide1 channel 0 drive 0
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0: TSSTcorp, CDW/DVD SH-M522C, TS01 ATAPI 
5/cdrom removable

cd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 2
pciide1: channel 1 disabled (no drives)
uhci0 at pci0 dev 16 function 0 VIA VT83C572 USB rev 0x81: apic 2 int 21
uhci1 at pci0 dev 16 function 1 VIA VT83C572 USB rev 0x81: apic 2 int 21
uhci2 at pci0 dev 16 function 2 VIA VT83C572 USB rev 0x81: apic 2 int 21
uhci3 at pci0 dev 16 function 3 VIA VT83C572 USB rev 0x81: apic 2 int 21
viapm0 at pci0 dev 17 function 0 

Re: Security and ignorance from the major ISPs

2013-02-14 Thread Geoff Steckel

 On 02/14/2013 06:40 PM, Ryan Freeman wrote

On Thu, Feb 14, 2013 at 04:20:30PM -0700, Daniel Bertrand wrote:

Hello,

Thanks for providing such great software. It really is much appreciated.

I was wondering what your stance is about the constant hack attempts on 
machines on our ISP networks..

I see CONSTANT scanning for ports from all over the world, mostly from Italy, 
Russia, and China.

yeah i see this daily.  doesn't matter, they never get anywhere.


Every firewall/router product that I have purchased has been compromised so far.

Is there really a secure, trustworthy adaptive filtering firewall configuration 
for each OS configuration out there?


l

block all is the right place to start from

adaptive??? adapt to what??? if you can't define that,
the problem is undefined and impossible to solve.

then: what unsolicited incoming packets do you want?
for user systems, (especially u$soft ones) NONE.
for server systems, what services do you need to expose?
only the incoming ports associated with those services.
you should consider allowing ICMP echo (ping) and traceroute
(UDP) to your domain.xxx and/or www.domain.xxx addresses.

outgoing connections? from secure systems, allow all is
probably OK - though blocking packets to useless
privileged ports is probably a good idea.

there is very little you can do if a system inside your
firewall is infected. subnets can help you: for instance,
only allowing requests to port 80 from users will defeat
some portion of dangerous packets. any use of micro$oft
inside your firewall is very dangerous - that software
is, by architecture and design, impossible to make secure.
any system using u$soft software must be regarded as
insecure at all times. Programs like skype, etc. which
use proprietary packet formats are inherently security
failures.

The internet is an extremely hostile place.
need to know and need to access must be your rules.
block everything else.

geoff steckel



Re: Small change to let mg handle localized characters

2012-08-23 Thread Geoff Steckel

On 08/23/2012 03:50 PM, Stefan Sperling wrote:

On Thu, Aug 23, 2012 at 08:58:53PM +0200, Eivind Evensen wrote:

Since version 1.10 of lib/libc/gen/ctype_.c, I've been
unable to use localized characters in mg properly (they're printed
as an octal value only).

I've been using the below change to regain support for printing them
normally.

Best regards, Eivind Evensen


Index: main.c
===
RCS file: /data/openbsd/src/usr.bin/mg/main.c,v
.
Eivind

This kind of change has been proposed before.
In my opinion it is not the right way of solving this problem.

It won't work correctly with multi-byte files (like UTF-8). E.g. typing
backspace to delete one character will delete one byte instead of the
entire character, which messes up the display. To properly support multi-byte
encodings mg needs to learn the difference between a byte and a character.

The locales mechanism and wchar_t are only useful for applications that do
not care about details of character encodings, and which only need to deal
with a single character set at a time. It is not very useful for editors
because they need to handle files in various encodings and be aware of
the current encoding in use.

Some applications in base (less and tmux, for example) have special
support code for UTF-8. This could be done for mg as well, so that
it can support single-byte character sets (ASCII, latin1) and also
UTF-8 (but no other multi-byte character set). You'd activate the
special UTF-8 mode if nl_langinfo(CODESET) returns UTF-8.

To properly support arbitrary multi-byte character sets (UTF-8, UTF-16,
special asian language encodings etc) mg needs iconv which we don't have
in base. I have some work-in-progress iconv code but it's not ready for
the tree yet and I'm not actively working on it at the moment.
If you want to help out with this let me know.

Using iconv in an editor is EXTREMELY dangerous without complex precautions.
Given a file containing characters not valid in the current locale,
it will at minimum prevent viewing the file.
If the file is written out, the file is destroyed.
IMnsHO, that is fatally flawed.

Returning an error for an impossible character translation is specified
in the archaic version of the Unicode standard I read.
That is not useful in an editor.

Geoff Steckel



Re: Small change to let mg handle localized characters

2012-08-23 Thread Geoff Steckel

On 08/23/2012 06:55 PM, Stefan Sperling wrote:

On Thu, Aug 23, 2012 at 05:32:51PM -0400, Geoff Steckel wrote:

Using iconv in an editor is EXTREMELY dangerous without complex precautions.
Given a file containing characters not valid in the current locale,
it will at minimum prevent viewing the file.

An editor needs to convert between character sets.
How else are you going to display a latin1 file in a UTF-8 locale,
for example?

If the current character set of the locale cannot display your file
because conversion from file source encoding to output encoding fails,
tough, you'll have display problems. What else is an application
supposed to do in this case? It's being asked to do something impossible.

BTW, vim links to libiconv. For some bizarre reason emacs links to
libossaudio instead ;)


If the file is written out, the file is destroyed.
IMnsHO, that is fatally flawed.

Well, yes, using a character set conversion API in stupid ways can
munge data. How does that relate to anything I was saying?

As long as iconv is only used to display data, not to change file
contents, you're perfectly right.

A real example is a L***x editor using iconv. Open a 5000 line file,
change line 100, line 500 contains a non-conforming character,
file is truncated there.

Not pretty.

Another real example. Bring up line containing non-conforming character.
Line appears blank.

I agree that it takes a great deal of care to implement a multi-character
set editor such that it works on all useful files while displaying in
a particular locale's character set.

Geoff Steckel



Re: CoDel

2012-08-15 Thread Geoff Steckel
On 08/15/2012 09:07 AM, Henning Brauer wrote:
 * Simon Perreault sperrea...@openbsd.org [2012-08-15 14:38]:
 Le 2012-08-14 20:29, David Gwynne a écrit :
 ill have to fix that.
 Oh, and I also realized that CBQ is even worse: it calls
 microuptime() on every enqueue *and* every dequeue!

 If it really was a bug, people would have noticed, no?
 altq being slow has and is being noticed.

To make a commercial version of altq work usefully on FreeBSD, the 
simplest and most effective solution was to increase HZ to something 
like 1000 and remove almost all calls to xxxuptime().

On slowish (under 1GHz) Pentium boards, the idle interrupt time was 
1-2%. That was deemed acceptable. Slow interrupt processing for Ethernet 
interfaces was far more painful and important to fix.

Geoff Steckel



sysv_msg queue id error

2012-02-03 Thread Geoff Steckel

Index: sys/kern/sysv_msg.c
===
RCS file: /cvs/src/sys/kern/sysv_msg.c,v
retrieving revision 1.24
diff -u -r1.24 sysv_msg.c
--- sys/kern/sysv_msg.c20 May 2011 16:06:25 -1.24
+++ sys/kern/sysv_msg.c3 Feb 2012 23:58:45 -
@@ -230,7 +230,7 @@
 goto again;

 found:
-*retval = que-que_id;
+*retval = IXSEQ_TO_IPCID(0, que-msqid_ds.msg_perm);
 return (error);
 }

@@ -421,7 +421,7 @@
 struct que *que;

 TAILQ_FOREACH(que, msg_queues, que_next)
-if (que-que_id == id)
+if (que-msqid_ds.msg_perm.seq == IPCID_TO_SEQ(id))
 break;

 /* don't return queues marked for removal */
Index: sys/sys/msg.h
===
RCS file: /cvs/src/sys/sys/msg.h,v
retrieving revision 1.16
diff -u -r1.16 msg.h
--- sys/sys/msg.h3 Jan 2011 23:08:07 -1.16
+++ sys/sys/msg.h3 Feb 2012 23:58:45 -
@@ -62,7 +62,6 @@

 struct que {
 struct msqid_dsmsqid_ds;
-intque_id;
 intque_flags;
 intque_references;

que_id was read to return msgq ids and to search for active msgqs but
was never set. msgqid_ds.msg_perm.seq was set but was only read by
user space programs.

Result: only 1 msg queue could be used because queue id 0 was the
only one returned and 0 matched the first queue.

Note: while 32767 semaphore sets, message queues, and SYSV shared memory
segments are probably enough for 5 or 10 years, the use of 0x7FFF, etc.
as masks is ugly. The sysv code is slightly inconsistent in that some
use an array for state and some use tailq. RBtree could be used instead.

I'd be glad to change this but I'm really more concerned with getting
very high performance for an application that's due for a demo next week.



Re: amd64 big dma patches

2012-01-03 Thread Geoff Steckel

On 01/03/2012 01:28 PM, Gregor Best wrote:

On my system, the patch causes wpi to timeout during firmware upload,
resulting in a non-working WiFi card.

The dmesg doesn't say anything more besides that. Is there anything I
can do to provide more useful data?

--
 Gregor Best

[demime 1.01d removed an attachment of type application/pgp-signature]

Thanks! I think my assumption to use 64 bit addressing as the default
was far too aggressive. There are more PCI devices which are 32bit only than
I had hoped. I'll rewrite it and check it on my systems. If there's an
obvious fix for wpi I'll send it, but pciide is worst and requires the
rethink. My apologies!



Re: raise max value for tcp autosizing buffer [WAS: misc@ network tuning for high bandwidth and high latency]

2011-12-05 Thread Geoff Steckel

On 12/05/2011 05:25 AM, Kevin Chadwick wrote:

On Mon, 05 Dec 2011 10:53:00 +0100
Sebastian Reitenbach wrote:


So to be able to shoot myself in the foot without the need to compile the 
kernel, I'll look into adding a sysctl to tweak the maximum size of the buffer. 
Well, depending on time and how fast I figure out how to do that, might take 
some time.


I don't know the best word but what about something like
net.inet.tcp.footshooter.

You'd have to be pretty dumb. To write into the mailing list saying

I don't know what's wrong. The only changes I've made are to increasse
performance by tweaking net.inet.tcp.footshooter.sbmax?
footshooter.net.sbmax perhaps? Such a hierarchy could be populated with 
all the parameters it's, umm, unwise to tweak without a lot of 
knowledge. A 90% frivolous suggestion.


Geoff Steckel



Re: raise max value for tcp autosizing buffer [WAS: misc@ network tuning for high bandwidth and high latency]

2011-12-04 Thread Geoff Steckel

On 12/04/2011 09:10 AM, Claudio Jeker wrote:

On Sun, Dec 04, 2011 at 01:35:33PM +0100, Sebastian Reitenbach wrote:

On Sunday, December 4, 2011 13:24 CET, Camiel Dobbelaarc...@sentia.nl  wrote:


On 4-12-2011 13:01, Sebastian Reitenbach wrote:

the default maximum size of the tcp send and receive buffer used by the 
autosizing algorithm is way too small, when trying to get maximum speed with 
high bandwidth and high latency connections.

I have tweaked SB_MAX on a system too, but it was for UDP.

When running a busy Unbound resolver, the recommendation is too bump the
receive buffer to 4M or even 8M. See
http://unbound.net/documentation/howto_optimise.html

Otherwise a lot of queries are dropped when the cache is cold.

I don't think there's a magic value that's right for everyone, so a
sysctl would be nice.  Maybe separate ones for tcp and udp.

I know similar sysctl's have been removed recently, and that they are
sometimes abused, but I'd say we have two valid use cases now.

So I'd love some more discussion.  :-)

since they were removed, and there is this keep it simple, and too many
knobs are bad attitude, which I think is not too bad, I just bumped the
SB_MAX value.
If there is consensus that a sysctl would make sense, I'd also look into
that approach and send new patch.


SB_MAX is there to protect your system. It gives a upperbound on how much
memory a socket may allocate. The current value is a compromize. Running
with a huge SB_MAX may make one connection faster but it will cause
resource starvation issues on busy systems.
Sure you can bump it but be aware of the consequneces (and it is why I
think we should not bump it at the moment). A proper change needs to
include some sort of resource management that ensures that we do not run
the kernel out of memory.

How many high speed high latency connections would it take to use a 
significant proportion of kernel memory? Waving hands at the problem: 
at 500 ms round trip delay, a 1Gb/s interface saturated link = 63MB 
buffers per direction. A 100Mb/sec link would use 7MB per direction. 
Multiple sockets on such a link should use a similar amount in total 
using the autosizing algorithm. If this is approximately correct, 
documenting a formula might be useful for sysadmins.


A system with 512MB physical memory should be able to saturate 2 or 3 
100Mb/s links with large delays without seriously depleting kernel 
memory. It seems unlikely that a small system with multiple saturated 
1Gb/s links (or 1 10Gb/s link) could do anything very useful.


The pathological case: many sockets each one sequentially saturating the 
link and then going idle. The current limit does not defend against this.


To generalize this problem: kernel memory is limited. It is autosized at 
boot time. Allowing any kernel subsystem to use a large amount 
jeopardizes system stability.


Does it make sense, philosophically and technically, to allow the 
sysadmin to add physical memory to the kernel at run time, perhaps 
limited to (arbitrarily) 50% of physical memory?


Geoff Steckel



Re: ifconfig ieee80211 scan error to stderr

2011-12-01 Thread Geoff Steckel

On 12/02/2011 12:35 AM, Philip Guenther wrote:

On Thu, Dec 1, 2011 at 6:45 PM, Christiano F. Haesbaert
haesba...@openbsd.org  wrote:

Hi, I think we should warn() on any error, not just EPERM.
This is more consistent with the rest of the code.

ok ?

I disagree with this.  The existing message is much clearer to the
non-root mortal user that gets it and, IMO, it's clearer for the
message to be sent to stdout with the rest of the output from the
scan.

As for errors other than EPERM, well, exactly what other errors *can*
that call return in ifconfig?
The existing error message should be retained as Mr. Guenther says. An 
else clause reporting any other error is very desirable if other 
errors are not anticipated.


The general principle of if an error is reported to you that you don't 
understand, pass it up the stack until somebody can record it or do 
something about it is important. It will save the maintainer's sanity 
when the kernel or library changes or adds functionality. 99.9% of the 
reporting code will never be executed. Trade that cost against weeks of 
frustration.


I'd be glad to share gory descriptions of weeks spent chasing unreported 
errors off line.


Geoff Steckel



increase MAXPHYS?

2011-11-11 Thread Geoff Steckel

Is there a reason MAXPHYS (currently = 64 * 1024) should not be raised,
at least on i386/amd64?

It appears that 64K transfers on a 120 MB/sec disk incurs about
1800 interrupts/second. The limit seems low for a modern system
and the interrupt rate is high.

Does anyone see bad consequences from raising MAXPHYS to 256K
and reducing the interrupt rate? Yes, it would mean more pages
locked down for larger transfers.

I ask this because on almost identical motherboards and CPUs
and exactly identical drives, Linux is faster (approx. 143MB/sec vs. 
114MB/sec).

Linux is using a *lot* more CPU (approx 20% system  interrupt time)
OpenBSD is almost idle. It is taking about 1000 interrupts per second
that appear to be disk-related.

My guess from the Linux vmstat numbers is that it is reading
128K or more at a time - approx 1000 transfers/sec and approx 143 MB/sec.
It may be queueing readahead requests which would overlap interrupt
service with hardware I/O - I don't know.

uname -a
OpenBSD lib.oat.com 4.9 GENERIC.MP#819 amd64

dd if=/dev/rsd1c bs=2m of=/dev/null count=1
1+0 records in
1+0 records out
2097152 bytes transferred in 184.068 secs (113932934 bytes/sec)

uname -a
Linux ping 2.6.32-34-generic-pae #77-Ubuntu SMP Tue Sep 13 21:16:18 UTC 
2011 i686 GNU/Linux


dd if=/dev/sdb of=/dev/null bs=2M count=1
1+0 records in
1+0 records out
2097152 bytes (21 GB) copied, 147.096 s, 143 MB/s

It's easy to make this change. Finding any bad consequences
would be harder, and I'd like any available wisdom first.
I have a reason to push disks as fast as hardware
allows and want to reduce software-induced bottlenecks
as much as possible.

thanks!
  Geoff Steckel



scratch increasing MAXPHYS

2011-11-11 Thread Geoff Steckel

Increasing MAXPHYS to 256K shows a few places where it's assumed that
there are 16 pages in MAXPHYS.

In dev/ic/ahci.c I had to make this change @307 to make the
scatter-gather table large enough - 1 entry per page + extra
because that's what the previous code had and didn't say why.
I could understand +1 because a lot of code works that way.

/* this makes ahci_cmd_table 512 bytes, supporting 128-byte alignment */
/* #define AHCI_MAX_PRDT24 too small for 256K of 4K pages */
/* extra 12 is to match old 16 + 8 */
#define AHCI_MAX_PRDT   ((MAXPHYS / PAGE_SIZE) + 8)

Grep-ing shows at least dev/ic/osiopvar.h doesn't compute
DMA resources from MAXPHYS. There are probably other 17s buried
in ugly places.

It doesn't seem to help disk I/O speed at all.
It *does* decrease interrupt rate to about 400/sec.

Now to try some other tests. Grumble.
   Geoff Steckel



async i/o (aio and lio) desirable?

2011-10-16 Thread Geoff Steckel
Is there any feeling that aio_* and lio_* functionality is desirable? Do 
rthreads supersede them?


If aio_ is desirable, what is the consensus about the tradeoff between 
the number of kernel thread vs. code complexity to use fewer threads to 
service multiple process' requests?


I took a look at freebsd aio code which might not map to openbsd low 
level routines  the swapper code which might (though single threaded) 
be a good place to start for low level code.


All opinions welcome!
  thanks
  Geoff Steckel



Re: ksh history corruption

2011-08-31 Thread Geoff Steckel

On 08/31/2011 03:42 PM, Marco Peereboom wrote:

Version 4 fixes all reported bugs.

Some folks have expressed doubt about the simplistic way of updating the
history file.  Specifically the rewriting of all entries.  I am
sensitive to that and know a couple of optimizations that can easily be
applied.  However before I go there I'd like to get a thumbs up or down
on this approach.  It trashes the binary history file format and
replaces it with flat text.  Is this something we want?
IMnsHO, external non-text files have serious maintenance problems 
including version dependency. Does the external binary file have any 
significant advantages over flat text? If not, my experience is that 
flat text is 99+% a better choice for maintainability, 
interchangeability, and general obviousness.


If an internal binary format has significant advantages, is the cost of 
conversion significant (coding time and execution time?) If not, go with 
an external text format for the above reasons.


Pure appends have a stylistic appeal as well.

Anecdotally, almost no-one has been able to show me real-world 
efficiency gains from binary files for applications where a text file 
works, especially for ones read once and/or written once per program 
invocation.


geoff steckel
gwes at oat mumble com