sync libcbor to 0.7.0

2020-07-29 Thread Damien Miller
Hi,

This syncs lib/libcbor from our v0.5.0+patches to the released v0.7.0

AFAIK the changes are mostly inconsequential to the current uses
in-tree (there is a stack exhaustion fix that is worth having), but
being at an actual release rather than a frankenpatch will make future
updates a bit easier.

Major bump because of one API removal.

ok?

Index: Makefile
===
RCS file: /cvs/src/lib/libcbor/Makefile,v
retrieving revision 1.2
diff -u -p -r1.2 Makefile
--- Makefile15 Nov 2019 03:19:39 -  1.2
+++ Makefile30 Jul 2020 04:31:37 -
@@ -10,6 +10,7 @@ SRCS= cbor.c
 
 WARNINGS=yes
 CDIAGFLAGS+=   -Wall -Wextra -Wno-unused-parameter
+CDIAGFLAGS+=   -Wno-missing-field-initializers
 #CDIAGFLAGS+=  -Werror
 
 # cbor/
Index: README.openbsd
===
RCS file: /cvs/src/lib/libcbor/README.openbsd,v
retrieving revision 1.1
diff -u -p -r1.1 README.openbsd
--- README.openbsd  14 Nov 2019 21:11:34 -  1.1
+++ README.openbsd  30 Jul 2020 04:31:37 -
@@ -1,4 +1,4 @@
-This is an import of https://github.com/pjk/libcbor v0.5.0
+This is an import of https://github.com/pjk/libcbor v0.7.0
 
 Apart from README.md and LICENSE.md, only the src/ directory has been
 imported.
Index: shlib_version
===
RCS file: /cvs/src/lib/libcbor/shlib_version,v
retrieving revision 1.3
diff -u -p -r1.3 shlib_version
--- shlib_version   28 Nov 2019 21:18:16 -  1.3
+++ shlib_version   30 Jul 2020 04:31:37 -
@@ -1,2 +1,2 @@
-major=0
-minor=6
+major=1
+minor=0
Index: src/allocators.c
===
RCS file: /cvs/src/lib/libcbor/src/allocators.c,v
retrieving revision 1.2
diff -u -p -r1.2 allocators.c
--- src/allocators.c28 Nov 2019 02:58:39 -  1.2
+++ src/allocators.c30 Jul 2020 04:31:37 -
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2014-2019 Pavel Kalvoda 
+ * Copyright (c) 2014-2020 Pavel Kalvoda 
  *
  * libcbor is free software; you can redistribute it and/or modify
  * it under the terms of the MIT license. See LICENSE for details.
Index: src/cbor.c
===
RCS file: /cvs/src/lib/libcbor/src/cbor.c,v
retrieving revision 1.2
diff -u -p -r1.2 cbor.c
--- src/cbor.c  28 Nov 2019 02:58:39 -  1.2
+++ src/cbor.c  30 Jul 2020 04:31:37 -
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2014-2019 Pavel Kalvoda 
+ * Copyright (c) 2014-2020 Pavel Kalvoda 
  *
  * libcbor is free software; you can redistribute it and/or modify
  * it under the terms of the MIT license. See LICENSE for details.
@@ -323,8 +323,7 @@ static void _cbor_nested_describe(cbor_i
   fprintf(out, "%*s[CBOR_TYPE_FLOAT_CTRL] ", indent, " ");
   if (cbor_float_ctrl_is_ctrl(item)) {
 if (cbor_is_bool(item))
-  fprintf(out, "Bool: %s\n",
-  cbor_ctrl_is_bool(item) ? "true" : "false");
+  fprintf(out, "Bool: %s\n", cbor_get_bool(item) ? "true" : "false");
 else if (cbor_is_undef(item))
   fprintf(out, "Undefined\n");
 else if (cbor_is_null(item))
Index: src/cbor.h
===
RCS file: /cvs/src/lib/libcbor/src/cbor.h,v
retrieving revision 1.2
diff -u -p -r1.2 cbor.h
--- src/cbor.h  28 Nov 2019 02:58:39 -  1.2
+++ src/cbor.h  30 Jul 2020 04:31:37 -
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2014-2019 Pavel Kalvoda 
+ * Copyright (c) 2014-2020 Pavel Kalvoda 
  *
  * libcbor is free software; you can redistribute it and/or modify
  * it under the terms of the MIT license. See LICENSE for details.
Index: src/cbor/arrays.c
===
RCS file: /cvs/src/lib/libcbor/src/cbor/arrays.c,v
retrieving revision 1.2
diff -u -p -r1.2 arrays.c
--- src/cbor/arrays.c   28 Nov 2019 02:58:39 -  1.2
+++ src/cbor/arrays.c   30 Jul 2020 04:31:37 -
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2014-2019 Pavel Kalvoda 
+ * Copyright (c) 2014-2020 Pavel Kalvoda 
  *
  * libcbor is free software; you can redistribute it and/or modify
  * it under the terms of the MIT license. See LICENSE for details.
Index: src/cbor/arrays.h
===
RCS file: /cvs/src/lib/libcbor/src/cbor/arrays.h,v
retrieving revision 1.2
diff -u -p -r1.2 arrays.h
--- src/cbor/arrays.h   28 Nov 2019 02:58:39 -  1.2
+++ src/cbor/arrays.h   30 Jul 2020 04:31:37 -
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2014-2019 Pavel Kalvoda 
+ * Copyright (c) 2014-2020 Pavel Kalvoda 
  *
  * libcbor is free software; you can redistribute it and/or modify
  * it under the terms of the MIT license. See LICENSE for details.
Index: src/cbor/bytestrings.c
===
RCS file: 

rdomain.4: route -T takes an rtable, not rdomain

2020-07-29 Thread Klemens Nanni
Multiple rtables may exist in the default rdomain (0), that is their
corresponding rdomains/lo(4) interfaces do not have to exist.

This demonstrates it;  first, nothing but default, so route(8) fails:

# netstat -R
Rdomain 0
  Interfaces: lo0 vio0 enc0
  Routing table: 0

# route -T1 exec id -R
route: routing table 1: No such file or directory

Then create an rdomain and with it an rtable:

# ifconfig lo1 rdomain 1
# netstat -R
Rdomain 0
  Interfaces: lo0 vio0 enc0
  Routing table: 0

Rdomain 1
  Interface: lo1
  Routing table: 1

This makes route(8) work, but it keeps working when we remove the
rdomain again since the rtable persits:

# route -T1 exec id -R
1
# ifconfig lo1 destroy
# netstat -R
Rdomain 0
  Interfaces: lo0 vio0 enc0
  Routing tables: 0 1

# route -T1 exec id -R
1


I'm not sure yet, whether this is intentional or in fact a bug.
Either ways, the manual should be fixed - route(8)'s synopsis says the
same, just like ping(8)'s `-V rtable':

$ man -hs8 route
route [-dnqtv] [-T rtable] command [[modifiers] args]

Feedback? Objections? OK?


Index: share/man/man4/rdomain.4
===
RCS file: /cvs/src/share/man/man4/rdomain.4,v
retrieving revision 1.13
diff -u -p -r1.13 rdomain.4
--- share/man/man4/rdomain.41 Feb 2020 15:00:20 -   1.13
+++ share/man/man4/rdomain.430 Jul 2020 01:56:39 -
@@ -98,7 +98,7 @@ Put em0 and lo4 in rdomain 4:
 # ifconfig em0 192.0.2.100/24
 .Ed
 .Pp
-Set a default route and localhost reject route within rdomain 4:
+Set a default route and localhost reject route within rtable 4:
 .Bd -literal -offset indent
 # route -T4 -qn add -net 127 127.0.0.1 -reject
 # route -T4 -n add default 192.0.2.1
@@ -106,7 +106,7 @@ Set a default route and localhost reject
 .Pp
 Start
 .Xr sshd 8
-in rdomain 4:
+in rtable 4:
 .Pp
 .Dl # route -T4 exec /usr/sbin/sshd
 .Pp



Re: Add ability to set control values with video(1)

2020-07-29 Thread Marcus Glocker
On Wed, 29 Jul 2020 07:45:52 +0200
Marcus Glocker  wrote:

> On Sat, 25 Jul 2020 18:27:45 +0100
> Laurence Tratt  wrote:
> 
> > On Sat, Jul 25, 2020 at 10:27:15AM -0600, Theo de Raadt wrote:
> > 
> > Hello Theo,
> >   
> > > My primary concern is about a user changing settings which then
> > > persist past close.
> > >
> > > Upon re-open, how do they know what mode they are in?
> > >
> > > I understand the default mode for a camera might not be nice.  But
> > > at least it is a default.  If the previous use of the camera put
> > > it into a nasty mode, does this mean all new users must first
> > > force default?
> > >
> > > Now you don't know what mode it is in.  As a result, you must
> > > *always* demand change into a specific mode.  Rather than making
> > > things simpler, doesn't use of a camera become potentially more
> > > complicated?
> > 
> > From what I can tell, there are two ways to control a uvideo device:
> > 
> >   1) The "semi-persistent" state changes that video(1) can make that
> > affects subsequent apps which access the device. My patch simply
> > makes those state changes possible from the command-line instead of
> > forcing the user to open a video and hold down keys until they reach
> > the desired state. In otReset all supported controls to their
> > default value and quit.her words, it doesn't change how you control
> > the device: "-c reset" is equivalent to running video(1) and
> > pressing "r", for example.
> > 
> >   2) Control via a loopback device. For example, on Windows,
> > Logitech allow you to change controls in their app where you can
> > see video; they then expose a second internal device which other
> > apps can use; I think controls are reset when the Logitech app is
> > closed.
> > 
> > On Linux I believe v4l2-ctl works as video(1) does (semi-persistent
> > state changes) but Linux also has video loopback devices. Presumably
> > they could do something similar to the Logitech Windows app, but I
> > don't know if they do so or not.
> > 
> > Unless we develop a loopback facility (or, perhaps, some sort of
> > uvideo daemon roughly equivalent to sndiod), I don't think we have
> > much choice but to continue with the semi-persistent state changes
> > that video(1) has always been capable of. It is a bit icky, but it's
> > the only way, for example, to change a webcam's brightness before
> > taking a video call in a web browser.  
> 
> OK - Let me try to pick this thread up once more :-)
> 
> Lets assume we leave the current behaviour of video(1) as is, which
> doesn't reset back the default control values on device close.
> 
> This adapted patch adds two new options and the ability to set
> multiple control values on the CLI with a similar interface as
> sysctl(8) has:
> 
> * c: List all current supported control values and quit
>   (-q is already occupied).
> 
> * d: Reset all supported controls to their default value and quit
>   (-r is already occupied).
>   -> This does the same thing as when pressing 'r' in the GUI.  
> 
> Some examples:
> 
>   $ doas video -c
>   brightness=128
>   contrast=32
>   saturation=64
>   hue=0
>   gamma=120
>   sharpness=2
>   white_balance_temperature=4000
>   $
> 
>   $ doas video brightness=200 sharpness=4 
>   brightness: 128 -> 200
>   sharpness: 2 -> 4
>   $
> 
>   $ doas video brightness
>   brightness: 200
>   $
> 
>   $ doas video -c
>   brightness=200
>   contrast=32
>   saturation=64
>   hue=0
>   gamma=120
>   sharpness=4
>   white_balance_temperature=4000
>   $
> 
>   $ doas video -d
>   $
> 
>   $ doas video -c
>   brightness=128
>   contrast=32
>   saturation=64
>   hue=0
>   gamma=120
>   sharpness=2
>   white_balance_temperature=4000
>   $
> 
>   $ doas video -f /dev/video1 gain=1
>   gain: 0 -> 1
>   $
> 
>   $ doas video -f /dev/video1 -c
>   brightness=128
>   contrast=128
>   saturation=128
>   gain=1
>   sharpness=128
>   white_balance_temperature=4000
>   $
> 
> What we could think about optionally is to reset to the default
> control values on device close when we did use the GUI mode, but
> leave the controls sticky when we set them through the CLI, and
> explicitly document that in the video(1) man page for the
> [control[=value]] arguments.
> 
> But entirely removing the sticky controls I think is odd since as
> already mentioned before, it will remove the ability to preset any
> controls before we enter an application where we can't change them,
> like in an browser.  And web conference calls are pretty common
> nowadays ...

Slightly adapted diff;  Negative numbers can happen on controls.


Index: video.1
===
RCS file: /cvs/xenocara/app/video/video.1,v
retrieving revision 1.15
diff -u -p -u -p -r1.15 video.1
--- video.1 17 Jul 2020 07:51:23 -   

Re: kernel crash in setrunqueue

2020-07-29 Thread Mike Larkin
On Wed, Jul 29, 2020 at 10:14:11PM +0200, Mark Kettenis wrote:
> > Date: Wed, 29 Jul 2020 13:03:43 -0700
> > From: Mike Larkin 
> >
> > Hi,
> >
> >  I'm seeing crashes on amd64 GENERIC.MP on a few VMs recently. This happens
> > on GENERIC.MP regardless of whether or not the VM has one cpu or more than
> > one. It does not happen on GENERIC kernels.
> >
> >  The crash will happen fairly quickly after the kernel starts executing
> > processes. Sometimes it crashes instantly, sometimes it lasts for a minute
> > or two. It rarely makes it to the login prompt. The problem is 100%
> > reproducible on two different VMs I have, running on two different
> > hypervisors (Hyper-V and ESXi6.7U2).
> >
> >  I first started noticing the problem on the 24th July snap, but TBH these
> > machines were not frequently updated, so the previous snap I had installed
> > might have been a couple months old. Whatever older snap was on them before
> > worked fine.
> >
> >  Since this is happening on two different machines with two different VMs,
> > I'm gonna rule out hardware issues.
> >
> >  Crash:
> >
> > kernel: pretection fault trap, code=0
> > Stopped at  setrunqueue+0xa2:   addl$0x1,0x288(%r13)
> >
> >  Trace:
> > ddb{2}> trace
> > setrunqueue(27b3d6c24c3fab80, 800015e874e0,32) at setrunqueue+0xa2
> > sched_barrier_task(800015f1a168) at sched_barrier_task+0x6c
> > taskq_thread(82121548) at taskq_thread+0x8d
> > end trace frame: 0x0, count: -3
> >
> >  Registers:
> > ddb{2}> sh r
> > rdi 0x821ee728  sched_lock
> > rsi 0x800014cc6ff0
> > rbp 0x800015ea0e40
> > rbx  0
> > rdx   0x23ca94  acpi_pdirpa_0x2288fc
> > rcx0xc
> > rax0xc
> > r8   0x202
> > r9 0x2
> > r10  0
> > r11 0x57f79bf6968709d8
> > r12 0x800015e874e0
> > r13 0x27b3d6c24c3fab80
> > r14   0x32
> > r15 0x27b3d6c24c3fab80
> > rip 0x81b9df22  setrunqueue+0xa2
> > cs 0x8
> > rflags 0x10207  __ALIGN_SIZE+0xf207
> > rsp 0x800015ea0df0
> > ss0x10
> >
> >
> > The offending instruction is in kern_sched.c:260:
> >
> > spc->spc_nrun++;
> >
> > ... which indicates 'spc' is trash (and it is, based on %r13 above). In my
> > tests, %r13 always is this same trash value. That comes from 'ci', which is
> > either passed in or chosen by sched_choosecpu. Neither of these functions
> > have changed recently, so I'm guessing this corruption is coming from 
> > something
> > else.
> >
> >  Anyone have ideas where to start looking? I suppose I could start 
> > bisecting,
> > but does anyone know of any changes that would affect this area?
> >
> >  I can send dmesgs if needed, but these are pretty standard VMs,
> > nothing fancy configured in them. 4 CPUs, 8GB RAM, etc.
>
> They're VMs and it turns out that many of the "PV" drivers are/were
> using the intr_barrier() interface the wrong way.
>
> For Hyper-V, see my reply in the "Panic on boot with Hyper-V since Jun
> 17 snapshot" thread on bugs@ from earlier today.
>
> Cheers,
>
> Mark
>

Thanks. I don't subscribe to bugs@ anymore, so that's why I likely missed it.

-ml



Re: kernel crash in setrunqueue

2020-07-29 Thread Mike Larkin
On Wed, Jul 29, 2020 at 01:03:43PM -0700, Mike Larkin wrote:
> Hi,
>
>  I'm seeing crashes on amd64 GENERIC.MP on a few VMs recently. This happens
> on GENERIC.MP regardless of whether or not the VM has one cpu or more than
> one. It does not happen on GENERIC kernels.
>
>  The crash will happen fairly quickly after the kernel starts executing
> processes. Sometimes it crashes instantly, sometimes it lasts for a minute
> or two. It rarely makes it to the login prompt. The problem is 100%
> reproducible on two different VMs I have, running on two different
> hypervisors (Hyper-V and ESXi6.7U2).
>
>  I first started noticing the problem on the 24th July snap, but TBH these
> machines were not frequently updated, so the previous snap I had installed
> might have been a couple months old. Whatever older snap was on them before
> worked fine.
>
>  Since this is happening on two different machines with two different VMs,
> I'm gonna rule out hardware issues.
>
>  Crash:
>
> kernel: pretection fault trap, code=0
> Stopped atsetrunqueue+0xa2:   addl$0x1,0x288(%r13)
>
>  Trace:
> ddb{2}> trace
> setrunqueue(27b3d6c24c3fab80, 800015e874e0,32) at setrunqueue+0xa2
> sched_barrier_task(800015f1a168) at sched_barrier_task+0x6c
> taskq_thread(82121548) at taskq_thread+0x8d
> end trace frame: 0x0, count: -3
>
>  Registers:
> ddb{2}> sh r
> rdi   0x821ee728  sched_lock
> rsi   0x800014cc6ff0
> rbp   0x800015ea0e40
> rbx0
> rdx 0x23ca94  acpi_pdirpa_0x2288fc
> rcx  0xc
> rax  0xc
> r8 0x202
> r9   0x2
> r100
> r11   0x57f79bf6968709d8
> r12   0x800015e874e0
> r13   0x27b3d6c24c3fab80
> r14 0x32
> r15   0x27b3d6c24c3fab80
> rip   0x81b9df22  setrunqueue+0xa2
> cs   0x8
> rflags   0x10207  __ALIGN_SIZE+0xf207
> rsp   0x800015ea0df0
> ss  0x10
>
>
> The offending instruction is in kern_sched.c:260:
>
>   spc->spc_nrun++;
>
> ... which indicates 'spc' is trash (and it is, based on %r13 above). In my
> tests, %r13 always is this same trash value. That comes from 'ci', which is
> either passed in or chosen by sched_choosecpu. Neither of these functions
> have changed recently, so I'm guessing this corruption is coming from 
> something
> else.
>
>  Anyone have ideas where to start looking? I suppose I could start bisecting,
> but does anyone know of any changes that would affect this area?
>
>  I can send dmesgs if needed, but these are pretty standard VMs, nothing fancy
> configured in them. 4 CPUs, 8GB RAM, etc.
>
> -ml
>

Also I should note that the problem happens with snaps as well as kernels built
from source (-current), so this isn't likely something that is in snaps but not
yet in tree.

-ml



Re: kernel crash in setrunqueue

2020-07-29 Thread Mark Kettenis
> Date: Wed, 29 Jul 2020 13:03:43 -0700
> From: Mike Larkin 
> 
> Hi,
> 
>  I'm seeing crashes on amd64 GENERIC.MP on a few VMs recently. This happens
> on GENERIC.MP regardless of whether or not the VM has one cpu or more than
> one. It does not happen on GENERIC kernels.
> 
>  The crash will happen fairly quickly after the kernel starts executing
> processes. Sometimes it crashes instantly, sometimes it lasts for a minute
> or two. It rarely makes it to the login prompt. The problem is 100%
> reproducible on two different VMs I have, running on two different
> hypervisors (Hyper-V and ESXi6.7U2).
> 
>  I first started noticing the problem on the 24th July snap, but TBH these
> machines were not frequently updated, so the previous snap I had installed
> might have been a couple months old. Whatever older snap was on them before
> worked fine.
> 
>  Since this is happening on two different machines with two different VMs,
> I'm gonna rule out hardware issues.
> 
>  Crash:
> 
> kernel: pretection fault trap, code=0
> Stopped atsetrunqueue+0xa2:   addl$0x1,0x288(%r13)
> 
>  Trace:
> ddb{2}> trace
> setrunqueue(27b3d6c24c3fab80, 800015e874e0,32) at setrunqueue+0xa2
> sched_barrier_task(800015f1a168) at sched_barrier_task+0x6c
> taskq_thread(82121548) at taskq_thread+0x8d
> end trace frame: 0x0, count: -3
> 
>  Registers:
> ddb{2}> sh r
> rdi   0x821ee728  sched_lock
> rsi   0x800014cc6ff0
> rbp   0x800015ea0e40
> rbx0
> rdx 0x23ca94  acpi_pdirpa_0x2288fc
> rcx  0xc
> rax  0xc
> r8 0x202
> r9   0x2
> r100
> r11   0x57f79bf6968709d8
> r12   0x800015e874e0
> r13   0x27b3d6c24c3fab80
> r14 0x32
> r15   0x27b3d6c24c3fab80
> rip   0x81b9df22  setrunqueue+0xa2
> cs   0x8
> rflags   0x10207  __ALIGN_SIZE+0xf207
> rsp   0x800015ea0df0
> ss  0x10
> 
> 
> The offending instruction is in kern_sched.c:260:
> 
>   spc->spc_nrun++;
> 
> ... which indicates 'spc' is trash (and it is, based on %r13 above). In my
> tests, %r13 always is this same trash value. That comes from 'ci', which is
> either passed in or chosen by sched_choosecpu. Neither of these functions
> have changed recently, so I'm guessing this corruption is coming from 
> something
> else.
> 
>  Anyone have ideas where to start looking? I suppose I could start bisecting,
> but does anyone know of any changes that would affect this area?
> 
>  I can send dmesgs if needed, but these are pretty standard VMs,
> nothing fancy configured in them. 4 CPUs, 8GB RAM, etc.

They're VMs and it turns out that many of the "PV" drivers are/were
using the intr_barrier() interface the wrong way.

For Hyper-V, see my reply in the "Panic on boot with Hyper-V since Jun
17 snapshot" thread on bugs@ from earlier today.

Cheers,

Mark



kernel crash in setrunqueue

2020-07-29 Thread Mike Larkin
Hi,

 I'm seeing crashes on amd64 GENERIC.MP on a few VMs recently. This happens
on GENERIC.MP regardless of whether or not the VM has one cpu or more than
one. It does not happen on GENERIC kernels.

 The crash will happen fairly quickly after the kernel starts executing
processes. Sometimes it crashes instantly, sometimes it lasts for a minute
or two. It rarely makes it to the login prompt. The problem is 100%
reproducible on two different VMs I have, running on two different
hypervisors (Hyper-V and ESXi6.7U2).

 I first started noticing the problem on the 24th July snap, but TBH these
machines were not frequently updated, so the previous snap I had installed
might have been a couple months old. Whatever older snap was on them before
worked fine.

 Since this is happening on two different machines with two different VMs,
I'm gonna rule out hardware issues.

 Crash:

kernel: pretection fault trap, code=0
Stopped at  setrunqueue+0xa2:   addl$0x1,0x288(%r13)

 Trace:
ddb{2}> trace
setrunqueue(27b3d6c24c3fab80, 800015e874e0,32) at setrunqueue+0xa2
sched_barrier_task(800015f1a168) at sched_barrier_task+0x6c
taskq_thread(82121548) at taskq_thread+0x8d
end trace frame: 0x0, count: -3

 Registers:
ddb{2}> sh r
rdi 0x821ee728  sched_lock
rsi 0x800014cc6ff0
rbp 0x800015ea0e40
rbx  0
rdx   0x23ca94  acpi_pdirpa_0x2288fc
rcx0xc
rax0xc
r8   0x202
r9 0x2
r10  0
r11 0x57f79bf6968709d8
r12 0x800015e874e0
r13 0x27b3d6c24c3fab80
r14   0x32
r15 0x27b3d6c24c3fab80
rip 0x81b9df22  setrunqueue+0xa2
cs 0x8
rflags 0x10207  __ALIGN_SIZE+0xf207
rsp 0x800015ea0df0
ss0x10


The offending instruction is in kern_sched.c:260:

spc->spc_nrun++;

... which indicates 'spc' is trash (and it is, based on %r13 above). In my
tests, %r13 always is this same trash value. That comes from 'ci', which is
either passed in or chosen by sched_choosecpu. Neither of these functions
have changed recently, so I'm guessing this corruption is coming from something
else.

 Anyone have ideas where to start looking? I suppose I could start bisecting,
but does anyone know of any changes that would affect this area?

 I can send dmesgs if needed, but these are pretty standard VMs, nothing fancy
configured in them. 4 CPUs, 8GB RAM, etc.

-ml



pipex(4): kill pipexintr()

2020-07-29 Thread Vitaliy Makkoveev
Now pipex(4) is fully covered by NET_LOCK() and this is documented. But
we still have an issue with pipex(4) session itself and I guess it's
time to fix it.

We have `pipexinq' and `pipexoutq' mbuf(9) queues to store mbufs. Each
mbuf(9) passed to these queues stores the pointer to corresponding
session referenced as `m_pkthdr.ph_cookie'. We enqueue incoming mbufs for
pppx(4) and incoming and outgoing mbufs for pppac(4). But we don't
enqueue pppoe related mbufs. After packet was enqueued to corresponding
queue we call schednetisr() which just schedules netisr() to run:

 cut begin 

780 pipex_ppp_enqueue(struct mbuf *m0, struct pipex_session *session,
781 struct mbuf_queue *mq)
782 {
783 m0->m_pkthdr.ph_cookie = session;
784 /* XXX need to support other protocols */
785 m0->m_pkthdr.ph_ppp_proto = PPP_IP;
786 
787 if (mq_enqueue(mq, m0) != 0)
788 return (1);
789 
790 schednetisr(NETISR_PIPEX);
791 
792 return (0);
793 }

 cut end 

Also we have pipex_timer() which should destroy session in safe way, but
it does this only for pppac(4) and only for sessions closed by
`PIPEXDSESSION' command:

 cut begin 

812 pipex_timer(void *ignored_arg)
813 {
/* skip */
846 case PIPEX_STATE_CLOSED:
847 /*
848  * mbuf queued in pipexinq or pipexoutq may have a
849  * refererce to this session.
850  */
851 if (!mq_empty() || !mq_empty())
852 continue;
853 
854 pipex_destroy_session(session);
855 break;

 cut end 

While we destroy sessions through pipex_rele_session() or through
pipex_iface_fini() or through `PIPEXSMODE' command we don't check
`pipexinq' and `pipexoutq' state. This means we can break them.

It's not guaranteed that netisr() will start just after schednetisr()
call. This means we can destroy session, but corresponding mbuf(9) is
stored within `pipexinq' or `pipexoutq'. It's `m_pkthdr.ph_cookie' still
stores pointer to destroyed session and we have use after free issue. I
wonder why we didn't caught panic yet.

I propose to kill `pipexinq', `pipexoutq' and pipexintr(). There is
absolutely no reason them to exist. This should not only fix issue
described above but simplifies code too.

Other ways are to implement reference counters for session or walk
through mbuf(9) queues and kill corresponding mbufs. It doesn't make
sense to go these ways.

Index: lib/libc/sys/sysctl.2
===
RCS file: /cvs/src/lib/libc/sys/sysctl.2,v
retrieving revision 1.40
diff -u -p -r1.40 sysctl.2
--- lib/libc/sys/sysctl.2   17 May 2020 05:48:39 -  1.40
+++ lib/libc/sys/sysctl.2   29 Jul 2020 13:47:40 -
@@ -2033,35 +2033,11 @@ The currently defined variable names are
 .Bl -column "Third level name" "integer" "Changeable" -offset indent
 .It Sy "Third level name" Ta Sy "Type" Ta Sy "Changeable"
 .It Dv PIPEXCTL_ENABLE Ta integer Ta yes
-.It Dv PIPEXCTL_INQ Ta node Ta not applicable
-.It Dv PIPEXCTL_OUTQ Ta node Ta not applicable
 .El
 .Bl -tag -width "123456"
 .It Dv PIPEXCTL_ENABLE
 If set to 1, enable PIPEX processing.
 The default is 0.
-.It Dv PIPEXCTL_INQ Pq Va net.pipex.inq
-Fourth level comprises an array of
-.Vt struct ifqueue
-structures containing information about the PIPEX packet input queue.
-The forth level names for the elements of
-.Vt struct ifqueue
-are the same as described in
-.Li ip.arpq
-in the
-.Dv PF_INET
-section.
-.It Dv PIPEXCTL_OUTQ Pq Va net.pipex.outq
-Fourth level comprises an array of
-.Vt struct ifqueue
-structures containing information about PIPEX packet output queue.
-The forth level names for the elements of
-.Vt struct ifqueue
-are the same as described in
-.Li ip.arpq
-in the
-.Dv PF_INET
-section.
 .El
 .El
 .Ss CTL_VFS
Index: sys/net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.616
diff -u -p -r1.616 if.c
--- sys/net/if.c24 Jul 2020 18:17:14 -  1.616
+++ sys/net/if.c29 Jul 2020 13:47:44 -
@@ -909,13 +909,6 @@ if_netisr(void *unused)
KERNEL_UNLOCK();
}
 #endif
-#ifdef PIPEX
-   if (n & (1 << NETISR_PIPEX)) {
-   KERNEL_LOCK();
-   pipexintr();
-   KERNEL_UNLOCK();
-   }
-#endif
t |= n;
}
 
Index: sys/net/netisr.h
===
RCS file: /cvs/src/sys/net/netisr.h,v
retrieving revision 1.51
diff -u -p -r1.51 netisr.h
--- sys/net/netisr.h6 Aug 2019 22:57:54 -   1.51
+++ sys/net/netisr.h29 Jul 2020 13:47:44 -
@@ -48,7 +48,6 @@
 #defineNETISR_IPV6 24  /* 

Re: Edgerouter 4 available for any OpenBSD dev that needs an octeon

2020-07-29 Thread Mike Larkin
On Tue, Jul 28, 2020 at 06:16:01PM -0700, Mike Larkin wrote:
> Someone (can't recall who) gave me an ER4. I found it while cleaning
> out my closet. Since I'm not active anymore, if any openbsd developer
> wants it, reach out to me privately and I'll see about sending it
> to you.
>
> Thanks.
>
> -ml
>

Thanks everyone, this is heading to an OpenBSD developer.

-ml



brconfig: strto*l -> strtonum()

2020-07-29 Thread Klemens Nanni


Poking and testing around in brconfig.c for tpmr(4) stuff, I noticed a
lot of old code around strto*l(3).

Many pass unbounded `long' values into the `[u]int32_t' struct members
without limiting them to at least the type size the value is stored in,
some report wrong commands in error messages, e.g.

# ifconfig switch1 portno vether1 9223372036854775808   # LONG_MAX + 1
ifconfig: invalid arg for portidx: 9223372036854775808

some pass values to the kernel that are above limits documented in
ifconfig(8), e.g.

#  ifconfig bridge0 maxage 50
ifconfig: bridge0: Invalid argument

and others overflow in-kernel due to that (causing persistent damage
that causes deleting and adding bridge members to fix it).

I'm aware that moving to strtonum(3) drops the ability to pass numbers
in bases others than ten, e.g. `ifconfig bridge0 ifcost vether0 0x3'
mut now be `... 15', but all of the options really want a decimal number
as far as I can judge after reading ifconfig(8) and testing them.

The only exception to this is switch(4)'s `datapath' which is printed in
hexadecimal, so I left it untouched such that it can be copy-pasted and
set as is.

Diff below applies converts all places to the very same strtonum() idiom
with proper range checks based on documented or type limits.

I've tested each option manually: valid values are still valid and but
more out of range values are catched now, many of them up front in
ifconfig rather than the kerne's ioctl() path.

I've also checked most of the ioctl() paths to see what those really
check for.  One or two diffs for the kernel should follow soon.

Feedback? OK?


Index: brconfig.c
===
RCS file: /cvs/src/sbin/ifconfig/brconfig.c,v
retrieving revision 1.26
diff -u -p -r1.26 brconfig.c
--- brconfig.c  29 Jul 2020 12:13:28 -  1.26
+++ brconfig.c  29 Jul 2020 17:06:09 -
@@ -418,18 +418,13 @@ void
 bridge_timeout(const char *arg, int d)
 {
struct ifbrparam bp;
-   long newtime;
-   char *endptr;
+   const char *errstr;
 
-   errno = 0;
-   newtime = strtol(arg, , 0);
-   if (arg[0] == '\0' || endptr[0] != '\0' ||
-   (newtime & ~INT_MAX) != 0L ||
-   (errno == ERANGE && newtime == LONG_MAX))
-   errx(1, "invalid arg for timeout: %s", arg);
+   bp.ifbrp_ctime = strtonum(arg, 0, UINT32_MAX, );
+   if (errstr)
+   err(1, "timeout %s is: %s", arg, errstr);
 
strlcpy(bp.ifbrp_name, ifname, sizeof(bp.ifbrp_name));
-   bp.ifbrp_ctime = newtime;
if (ioctl(sock, SIOCBRDGSTO, (caddr_t)) == -1)
err(1, "%s", ifname);
 }
@@ -438,17 +433,13 @@ void
 bridge_maxage(const char *arg, int d)
 {
struct ifbrparam bp;
-   unsigned long v;
-   char *endptr;
+   const char *errstr;
 
-   errno = 0;
-   v = strtoul(arg, , 0);
-   if (arg[0] == '\0' || endptr[0] != '\0' || v > 0xffUL ||
-   (errno == ERANGE && v == ULONG_MAX))
-   errx(1, "invalid arg for maxage: %s", arg);
+   bp.ifbrp_maxage = strtonum(arg, 6, 40, );
+   if (errstr)
+   errx(1, "maxage %s is: %s", arg, errstr);
 
strlcpy(bp.ifbrp_name, ifname, sizeof(bp.ifbrp_name));
-   bp.ifbrp_maxage = v;
if (ioctl(sock, SIOCBRDGSMA, (caddr_t)) == -1)
err(1, "%s", ifname);
 }
@@ -457,17 +448,13 @@ void
 bridge_priority(const char *arg, int d)
 {
struct ifbrparam bp;
-   unsigned long v;
-   char *endptr;
+   const char *errstr;
 
-   errno = 0;
-   v = strtoul(arg, , 0);
-   if (arg[0] == '\0' || endptr[0] != '\0' || v > 0xUL ||
-   (errno == ERANGE && v == ULONG_MAX))
-   errx(1, "invalid arg for spanpriority: %s", arg);
+   bp.ifbrp_prio  = strtonum(arg, 0, 61440, );
+   if (errstr)
+   errx(1, "spanpriority %s is: %s", arg, errstr);
 
strlcpy(bp.ifbrp_name, ifname, sizeof(bp.ifbrp_name));
-   bp.ifbrp_prio = v;
if (ioctl(sock, SIOCBRDGSPRI, (caddr_t)) == -1)
err(1, "%s", ifname);
 }
@@ -478,7 +465,7 @@ bridge_protect(const char *ifsname, cons
struct ifbreq breq;
unsigned long v;
char *optlist, *str;
-   char *endptr;
+   const char *errstr;
 
strlcpy(breq.ifbr_name, ifname, sizeof(breq.ifbr_name));
strlcpy(breq.ifbr_ifsname, ifsname, sizeof(breq.ifbr_ifsname));
@@ -491,11 +478,9 @@ bridge_protect(const char *ifsname, cons
 
str = strtok(optlist, ",");
while (str != NULL) {
-   errno = 0;
-   v = strtoul(str, , 0);
-   if (str[0] == '\0' || endptr[0] != '\0' || v == 0 || v > 31 ||
-   (errno == ERANGE && v == ULONG_MAX))
-   err(1, "invalid value for protected domain: %s", str);
+   v = strtonum(str, 1, 31, );
+   if (errstr)
+   err(1, 

Re: ssh(1), getrrsetbyname(3), SSHFP and DNSSEC

2020-07-29 Thread Peter J. Philipp
On Wed, Jul 29, 2020 at 05:42:16PM +0200, Florian Obser wrote:
> > First you mention fallback to DHCP-learned resolvers.  Those you should
> > probably not trust indeed, but it looks like unwind(8) attempts to use
> > them to perform its own validation.  So the value of the AD flag in
> > unwind(8)'s response should be trustworthy.  At least that's what I see
> > when I test unwind with "preference { dhcp }".
> >
> 
> unwind always does its own validation or decides it can't do
> validation. In both cases you can trust the AD flag.
> The AD flag is set if and only if unwind did a successful validation.
> 
> You cannot trust the RCODE (and answer) when AD is *NOT* set. That
> seems like a strange and obvious statement but it is the main
> difference to validating unbound. When you enable validation in
> unbound and someone messes with your packets you will get a SERVFAIL.

Hi all,

I just put unwind on my own forwarding dns server and you can see what it does
in the following logs (I like it!):

--- here I restarted unwind, it looks up the root zone ---
Jul 29 18:06:19 eta delphinusdnsd[56662]: request on descriptor 16 interface \
"cnmac1" from 192.168.177.2 (ttl=64, region=255, tta=1.291ms) for "." \
type=NS(2) class=1, edns0, dnssecok, answering "FORWARD" (28/83)
 
Jul 29 18:06:19 eta delphinusdnsd[56662]: request on descriptor 16 interface \
"cnmac1" from 192.168.177.2 (ttl=64, region=255, tta=1.486ms) for "." \
type=DNSKEY(48) class=1, edns0, dnssecok, answering "FORWARD" (28/83)

Jul 29 18:06:19 eta delphinusdnsd[56662]: request on descriptor 16 interface \
"cnmac1" from 192.168.177.2 (ttl=64, region=255, tta=1.286ms) for "." \
type=DNSKEY(48) class=1, edns0, dnssecok, answering "FORWARD" (28/83)

 it now has the anchor keys to check that these dnskeys are right 

 at this point I did a ping of centroid.eu -

Jul 29 18:07:00 eta delphinusdnsd[56662]: request on descriptor 16 interface \
"cnmac1" from 192.168.177.2 (ttl=64, region=255, tta=1.347ms) for \
"centroid.eu." type=A(1) class=1, edns0, dnssecok, answering "FORWARD" (40/83)

--- it then proceeds to check the eu. DS and DNSKEY to do DNSSEC validation ---

Jul 29 18:07:00 eta delphinusdnsd[56662]: request on descriptor 16 interface \
"cnmac1" from 192.168.177.2 (ttl=64, region=255, tta=1.358ms) for "eu." \
type=DS(43) class=1, edns0, dnssecok, answering "FORWARD" (31/83)

Jul 29 18:07:00 eta delphinusdnsd[56662]: request on descriptor 16 interface \
"cnmac1" from 192.168.177.2 (ttl=64, region=255, tta=1.289ms) for "eu." \
type=DNSKEY(48) class=1, edns0, dnssecok, answering "FORWARD" (31/83)

Jul 29 18:07:00 eta delphinusdnsd[56662]: request on descriptor 16 interface \
"cnmac1" from 192.168.177.2 (ttl=64, region=255, tta=1.292ms) for \
"centroid.eu." type=DS(43) class=1, edns0, dnssecok, answering "FORWARD" (40/83)

Jul 29 18:07:00 eta delphinusdnsd[56662]: request on descriptor 16 interface \
"cnmac1" from 192.168.177.2 (ttl=64, region=255, tta=1.346ms) for \
"centroid.eu." type=DNSKEY(48) class=1, edns0, dnssecok, answering "FORWARD" \
(40/83)

--- at this point the DNSSEC validation reverse walk is complete 

Here is the unwind .conf file I used on my network:

forwarder 192.168.177.1
preference forwarder

And here is some attempt at explaining the delphinusdnsd log format:
https://delphinusdns.org/blog/c?article=1568367256

The log FORWARD is new, it just means that delphinusdnsd gives off the
question to its own forwarding engine that forwards the request.  My setup
is as follows (in terms of dns servers):

unwind[1]->delphinusdnsd forwarder[2]->delphinusdnsd forwarder[3]->unbound[4]

The path between [2] and [3] is a secure TSIG channel on another port than 53.
[2] has a cache, [3] does not.

I assume unwind also has a cache?

Unbound does give the AD bit back here, though that's not in my logs.  I know
this though because of dnssecok and edns0 bits on the question in delphinusdnsd 
it is passed forwards.  I know also that unwind would get the AD flag here,

It is just like Florian describes it.  unwind does validation despite the AD
flag.  This is awesome!

I think I'll keep this setup for a while it can't hurt.  Ultimately I'd love
to get rid of [3] and [4] in this setup and have [2] use the ISP's powerdns
nameservers.  But [2] sits at my gateway (a little octeon USG) and can't get
the nameservers from the pppoe0 without a patch.  Because maintaining patches
that don't come with OpenBSD get discarded and overrun eventually I can only
point to the patch I sent about a month ago to tech@ that brings Gerhard
Roth's patch up to the times (but I broke it in some way, so it would need to
be fixed up a little).  I would really like to see this functionality in
OpenBSD for modifying this little setup above, *pretty please?* :)

Thanks guys for the great work and interesting discussion!

-peter



Re: ssh(1), getrrsetbyname(3), SSHFP and DNSSEC

2020-07-29 Thread Jeremie Courreges-Anglas
On Wed, Jul 29 2020, Florian Obser  wrote:
> On Wed, Jul 29, 2020 at 03:51:55PM +0200, Jeremie Courreges-Anglas wrote:
>> So I did a few tests and read some unwind(8) code, and it *appears* safe
>> to use unwind(8) along with "options trust-ad".
>
> Yes.
>
>> 
>> First you mention fallback to DHCP-learned resolvers.  Those you should
>> probably not trust indeed, but it looks like unwind(8) attempts to use
>> them to perform its own validation.  So the value of the AD flag in
>> unwind(8)'s response should be trustworthy.  At least that's what I see
>> when I test unwind with "preference { dhcp }".
>>
>
> unwind always does its own validation or decides it can't do
> validation. In both cases you can trust the AD flag.
> The AD flag is set if and only if unwind did a successful validation.
>
> You cannot trust the RCODE (and answer) when AD is *NOT* set. That
> seems like a strange and obvious statement but it is the main
> difference to validating unbound. When you enable validation in
> unbound and someone messes with your packets you will get a SERVFAIL.
>
> You can trick unwind into believing that DNSSEC validation is not
> possible in your current network location. It will still resolve but
> it will NOT set AD.

Thanks for confirming.

> In the context of the current discussion this means that you can't
> validate SSHFPs, but you can still try to connect to your ssh server
> if you have the fingerprint in known_hosts. The later is still save
> even if people play tricks with DNS in your current location.

Yep.

>> And then there's the asr fallback, tested with "preference { stub }".
>> unwind(8) allocates its own asr context, ignoring the global
>> /etc/resolv.conf and thus "options trust-ad".  IIUC it then hands off
>
> Yes. But even if it did not it would not matter since it will clear
> AD.
>
>> the answer to libunbound which cooks its own soup.  So it looks like
>
> No, the asr fallback is there for when your local network is truly
> fucked - think last century equipment that knows DNS is udp/53 only and
> max 512 bytes. While we already lower standards considerably in unwind
> when we use libunbound compared to unbound the network still needs to
> provide some things. Working edns0 is one of them, it's near
> impossible to turn of in libunbound.
>
>> unwind needs no special code to disable/ignore "options trust-ad".
>> 
>> I have no idea how unwind/libunbound behaves when using forwarders, the
>> "accept bogus" feature, (opportunistic) DoT or other stuff I'm probably
>
> Nope, as stated above, you can trust AD.
>
>> overlooking.  But AFAICS using unwind + "options trust-ad" brings no
>> problem.  cc'ing Florian, input welcome.
>> 
>> I think the issue with trust-ad is the list of resolvers that end up in
>> resolv.conf.  I would expect people who use a resolver on localhost to
>> also have other resolvers listed in their /etc/resolv.conf, because of
>> a conscious choice (bsd.rd) or just because it should be more reliable.
>> It seems to me that the resolv.conf(5) manpage addition should cover
>> that.  Or should I make it more explicit?
>> 
>> Index: share/man/man5/resolv.conf.5
>> ===
>> RCS file: /d/cvs/src/share/man/man5/resolv.conf.5,v
>> retrieving revision 1.60
>> diff -u -p -r1.60 resolv.conf.5
>> --- share/man/man5/resolv.conf.5 25 Apr 2020 14:22:04 -  1.60
>> +++ share/man/man5/resolv.conf.5 25 Jul 2020 02:08:17 -
>> @@ -285,6 +285,15 @@ first as an absolute name before any sea
>>  .It Cm tcp
>>  Forces the use of TCP for queries.
>>  Normal behaviour is to query via UDP but fall back to TCP on failure.
>> +.It Cm trust-ad
>> +Sets the AD flag in DNS queries, and preserve the value of the AD flag
>> +in DNS replies (this flag is stripped by default).
>> +Useful when applications want to ensure that the DNS resources they
>> +look up have been signed with DNSSEC and validated by the
>> +.Ic nameserver .
>> +This option should be used only if the transport between the
>> +.Ic nameserver
>> +and the resolver is secure.
>
> The last sentence is the problem in my opinion.

Note: that sentence was adapted from getrrsetbyname(3), stripping the
example list precisely because I didn't want to be specific.  The devil
is in the details.  I'm not even sure loopback communication be
considered secure.

> trust-ad is a button
> mere mortals cannot push.

Do mere mortals care about or rely on DNSSEC? ;)

> The typical answer is: Oh, I use a vpn tunel and my nameserver is
> behind the tunnel. Does the tunel terminate on the nameserver?
> My enterprise vpn hands me 4 resolvers. They are in a different
> subnet than the vpn concentrator.

Well one of the reasons I'm proposing a knob is that I doubt we can find
a one-size-fits-all solution.

-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE



Re: ssh(1), getrrsetbyname(3), SSHFP and DNSSEC

2020-07-29 Thread Florian Obser
On Wed, Jul 29, 2020 at 03:51:55PM +0200, Jeremie Courreges-Anglas wrote:
> So I did a few tests and read some unwind(8) code, and it *appears* safe
> to use unwind(8) along with "options trust-ad".

Yes.

> 
> First you mention fallback to DHCP-learned resolvers.  Those you should
> probably not trust indeed, but it looks like unwind(8) attempts to use
> them to perform its own validation.  So the value of the AD flag in
> unwind(8)'s response should be trustworthy.  At least that's what I see
> when I test unwind with "preference { dhcp }".
>

unwind always does its own validation or decides it can't do
validation. In both cases you can trust the AD flag.
The AD flag is set if and only if unwind did a successful validation.

You cannot trust the RCODE (and answer) when AD is *NOT* set. That
seems like a strange and obvious statement but it is the main
difference to validating unbound. When you enable validation in
unbound and someone messes with your packets you will get a SERVFAIL.

You can trick unwind into believing that DNSSEC validation is not
possible in your current network location. It will still resolve but
it will NOT set AD.

In the context of the current discussion this means that you can't
validate SSHFPs, but you can still try to connect to your ssh server
if you have the fingerprint in known_hosts. The later is still save
even if people play tricks with DNS in your current location.
 
> And then there's the asr fallback, tested with "preference { stub }".
> unwind(8) allocates its own asr context, ignoring the global
> /etc/resolv.conf and thus "options trust-ad".  IIUC it then hands off

Yes. But even if it did not it would not matter since it will clear
AD.

> the answer to libunbound which cooks its own soup.  So it looks like

No, the asr fallback is there for when your local network is truly
fucked - think last century equipment that knows DNS is udp/53 only and
max 512 bytes. While we already lower standards considerably in unwind
when we use libunbound compared to unbound the network still needs to
provide some things. Working edns0 is one of them, it's near
impossible to turn of in libunbound.

> unwind needs no special code to disable/ignore "options trust-ad".
> 
> I have no idea how unwind/libunbound behaves when using forwarders, the
> "accept bogus" feature, (opportunistic) DoT or other stuff I'm probably

Nope, as stated above, you can trust AD.

> overlooking.  But AFAICS using unwind + "options trust-ad" brings no
> problem.  cc'ing Florian, input welcome.
> 
> I think the issue with trust-ad is the list of resolvers that end up in
> resolv.conf.  I would expect people who use a resolver on localhost to
> also have other resolvers listed in their /etc/resolv.conf, because of
> a conscious choice (bsd.rd) or just because it should be more reliable.
> It seems to me that the resolv.conf(5) manpage addition should cover
> that.  Or should I make it more explicit?
> 
> Index: share/man/man5/resolv.conf.5
> ===
> RCS file: /d/cvs/src/share/man/man5/resolv.conf.5,v
> retrieving revision 1.60
> diff -u -p -r1.60 resolv.conf.5
> --- share/man/man5/resolv.conf.5  25 Apr 2020 14:22:04 -  1.60
> +++ share/man/man5/resolv.conf.5  25 Jul 2020 02:08:17 -
> @@ -285,6 +285,15 @@ first as an absolute name before any sea
>  .It Cm tcp
>  Forces the use of TCP for queries.
>  Normal behaviour is to query via UDP but fall back to TCP on failure.
> +.It Cm trust-ad
> +Sets the AD flag in DNS queries, and preserve the value of the AD flag
> +in DNS replies (this flag is stripped by default).
> +Useful when applications want to ensure that the DNS resources they
> +look up have been signed with DNSSEC and validated by the
> +.Ic nameserver .
> +This option should be used only if the transport between the
> +.Ic nameserver
> +and the resolver is secure.

The last sentence is the problem in my opinion. trust-ad is a button
mere mortals cannot push.

The typical answer is: Oh, I use a vpn tunel and my nameserver is
behind the tunnel. Does the tunel terminate on the nameserver?
My enterprise vpn hands me 4 resolvers. They are in a different
subnet than the vpn concentrator.

>  .El
>  .El
>  .Pp
> 
> Lazy me would be tempted to just let asr trust 127.0.0.1 / ::1 by

probably only save if you know that you are talking to certain
software that does it's own validation (unwind, unbound). Not save if
you are talking to say dnscrypt-proxy.

> default and let others deal with the edge cases, but where is the fun in
> that... :)
> 
> -- 
> jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE
> 

-- 
I'm not entirely sure you are real.



Re: ssh(1), getrrsetbyname(3), SSHFP and DNSSEC

2020-07-29 Thread Theo de Raadt
Florian Obser  wrote:

> On Wed, Jul 29, 2020 at 03:51:17PM +0200, Sebastian Benoit wrote:
> > If i remember correctly, the fallout was caused by EDNS but i might be
> > wrong. The unbound commit caused a developer some headscratching, because
> > his upstream internet did not work with such packets, which led to immediate
> > backout of the change, because a default config that does not work is not
> > good.
> 
> It was time. Running DNSSEC validation on a system without an RTC is
> not a good idea. NTP could not fix this because it depends on working
> DNS. This has since been addressed by Otto.

Addressed as well as possible, but perfection is not achievable.

ntpd (which is run by default) tries aggressively to repair the clock at
boot.  Later on, it is not so aggressive at correcting the clock.

For real computers, this is of no concern, it will work fine.

But on machines without a battery RTC, if internet operation is weak at
boot, but becomes better later on, there is still a window of time
(before ntpd slowly corrects the clock) where time is incoherent and
weird stuff can happen.

So my recommendation is to buy real computers.  And garbage can misbehave
like garbage will.



Re: hostname.if '!' commands and rdomains

2020-07-29 Thread Klemens Nanni
On Wed, Jul 29, 2020 at 09:05:14AM -0600, Theo de Raadt wrote:
> Claudio Jeker  wrote:
> > But:
> > $ route -T2 exec id -R
> > 2
> > $ route -T2 exec route -T0 exec id -R
> > route: setrtable: Operation not permitted
> > 
> > Only root can change the rdomain if it is currently != 0.
> 
> That worry was stated in my email, but not so accurately, thank you.
> So now you can't make a rdomain-0 !command in the global scope.
Indeed, my example was incomplete, but as netstart(8) runs as root this
is not a problem - unless of course `!' commands do stuff as
unprivileged users in foreign routing domains.

With that in mind, I'm getting more convinced that forcing the routing
domain in hostname.if(5) is not feasible.



Re: ssh(1), getrrsetbyname(3), SSHFP and DNSSEC

2020-07-29 Thread Florian Obser
On Wed, Jul 29, 2020 at 03:51:17PM +0200, Sebastian Benoit wrote:
> If i remember correctly, the fallout was caused by EDNS but i might be
> wrong. The unbound commit caused a developer some headscratching, because
> his upstream internet did not work with such packets, which led to immediate
> backout of the change, because a default config that does not work is not
> good.

It was time. Running DNSSEC validation on a system without an RTC is
not a good idea. NTP could not fix this because it depends on working
DNS. This has since been addressed by Otto.

The edns problem is well understood and has nothing to do with turning
DNSSEC validation on in unbound since unbound always sends an edns0
option. So if your network sucks so badly that you can't edns0 you
can't use unbound, period.

-- 
I'm not entirely sure you are real.



Re: hostname.if '!' commands and rdomains

2020-07-29 Thread Theo de Raadt
Claudio Jeker  wrote:

> On Wed, Jul 29, 2020 at 04:43:18PM +0200, Klemens Nanni wrote:
> > On Wed, Jul 29, 2020 at 05:33:14PM +0300, Kapetanakis Giannis wrote:
> > > Wouldn't this break those who already have
> > > !route -T2 
> > > 
> > > in their hostname.if files?
> > No,
> > 
> > $ route -T1 exec id -R
> > 1
> > $ route -T0 exec route -T1 exec id -R
> > 1
> 
> But:
>   $ route -T2 exec id -R
>   2
>   $ route -T2 exec route -T0 exec id -R
>   route: setrtable: Operation not permitted
> 
> Only root can change the rdomain if it is currently != 0.

That worry was stated in my email, but not so accurately, thank you.
So now you can't make a rdomain-0 !command in the global scope.

And if we start playing with "early !commands run in 0", that gets
even more messy.



Re: hostname.if '!' commands and rdomains

2020-07-29 Thread Kapetanakis Giannis
On 29/07/2020 17:43, Klemens Nanni wrote:
> On Wed, Jul 29, 2020 at 05:33:14PM +0300, Kapetanakis Giannis wrote:
>> Wouldn't this break those who already have
>> !route -T2 
>>
>> in their hostname.if files?
> No,
>
>   $ route -T1 exec id -R
>   1
>   $ route -T0 exec route -T1 exec id -R
>   1
>
you're right,

Also verified with
route -T0 route -T1 add

G



Re: hostname.if '!' commands and rdomains

2020-07-29 Thread Claudio Jeker
On Wed, Jul 29, 2020 at 04:43:18PM +0200, Klemens Nanni wrote:
> On Wed, Jul 29, 2020 at 05:33:14PM +0300, Kapetanakis Giannis wrote:
> > Wouldn't this break those who already have
> > !route -T2 
> > 
> > in their hostname.if files?
> No,
> 
>   $ route -T1 exec id -R
>   1
>   $ route -T0 exec route -T1 exec id -R
>   1

But:
$ route -T2 exec id -R
2
$ route -T2 exec route -T0 exec id -R
route: setrtable: Operation not permitted

Only root can change the rdomain if it is currently != 0.

-- 
:wq Claudio



Re: hostname.if '!' commands and rdomains

2020-07-29 Thread Theo de Raadt
You were already able to execute a !command inside the rdomain, either
by specifying the rdomain (on commands which can do that) or by using
route -T manually.

But now, you can't easily execute commands *outside the rdomain*, and
there are some things folk might want to do.

Also, there is an order of evaluation.  Commands before the rdomain
keywords are outside, but commands afterwards are in the rdomain.  That
troubles me a little, especially becuase it's another piece which will
be difficult to document.

Matthieu Herrb  wrote:

> Hi,
> 
> When I'm configuring an interface with a spécific rdomain, I'd assume
> that '!' commands (especially /sbin/route commands) are executed in
> the rdomain for this interface.
> 
> I know that parsing this file is complex and somehow fragile but still
> I tried to write a patch.
> 
> What do you think ?
> 
> Of course I'm ok with any enhancements / fixes to my shell foo.
> 
> --- netstart.orig Wed Jul 29 11:19:53 2020
> +++ netstart  Wed Jul 29 11:52:39 2020
> @@ -67,8 +67,16 @@
>   _cmds[${#_cmds[*]}]="ifconfig $_if ${_c[@]} up;dhclient $_if"
>   V4_DHCPCONF=true
>   ;;
> + rdomain) ((${#_c[*]} == 2)) || return
> + _cmds[${#_cmds[*]}]="ifconfig $_if rdomain ${_c[_name]}"
> + _rdomain=${_c[_name]}
> + ;;
>   '!'*)   _cmd=$(print -- "${_c[@]}" | sed 's/\$if/'$_if'/g')
> - _cmds[${#_cmds[*]}]="${_cmd#!}"
> + if [[ $_rdomain -ne 0 ]]; then
> +_cmds[${#_cmds[*]}]="/sbin/route -T$_rdomain exec 
> ${_cmd#!}"
> + else
> +_cmds[${#_cmds[*]}]="${_cmd#!}"
> + fi
>   ;;
>   *)  _cmds[${#_cmds[*]}]="ifconfig $_if ${_c[@]}"
>   ;;
> 
> -- 
> Matthieu Herrb
> 



Re: hostname.if '!' commands and rdomains

2020-07-29 Thread Klemens Nanni
On Wed, Jul 29, 2020 at 05:33:14PM +0300, Kapetanakis Giannis wrote:
> Wouldn't this break those who already have
> !route -T2 
> 
> in their hostname.if files?
No,

$ route -T1 exec id -R
1
$ route -T0 exec route -T1 exec id -R
1



Re: ssh(1), getrrsetbyname(3), SSHFP and DNSSEC

2020-07-29 Thread Sebastian Benoit
Jeremie Courreges-Anglas(j...@wxcvbn.org) on 2020.07.29 15:51:55 +0200:
> On Sun, Jul 26 2020, Jeremie Courreges-Anglas  wrote:
> > On Sat, Jul 25 2020, Sebastian Benoit  wrote:
> 
> [...]
> 
> >> If you enable trust-ad on a system that moves around, e.g. your laptop, you
> >> will experience failures, which is why unwind tests for this and falls back
> >> to non-trusting dhcp learned resolvers in such cases. In fact it also falls
> >> back to the stub resolver, i.e. our libc resolver. Which makes me think 
> >> that
> >> trust-ad should not be used when you run unwind (because the whole point of
> >> the fallback is that dnssec does not work). But thats a documentation 
> >> issue.
> 
> One important thing to keep in mind: trust-ad is not so much about being
> able to do DNSSEC validation.  It's about trusting the validation done
> by the resolver(s).  So if the resolver can't do validation and ensures
> that AD flag is clear in the reply there is no breach of contract.
> 
> > Thanks for pointing this out.  I would expect from unwind(8) that it
> > always clears the AD bit from its responses if it could not validate
> > them.  Inclusing when it falls back to dynamically learned resolvers or
> > the libc resolver.
> 
> So I did a few tests and read some unwind(8) code, and it *appears* safe
> to use unwind(8) along with "options trust-ad".
> 
> First you mention fallback to DHCP-learned resolvers.  Those you should
> probably not trust indeed, but it looks like unwind(8) attempts to use
> them to perform its own validation.  So the value of the AD flag in
> unwind(8)'s response should be trustworthy.  At least that's what I see
> when I test unwind with "preference { dhcp }".
> 
> And then there's the asr fallback, tested with "preference { stub }".
> unwind(8) allocates its own asr context, ignoring the global
> /etc/resolv.conf and thus "options trust-ad".  IIUC it then hands off
> the answer to libunbound which cooks its own soup.  So it looks like
> unwind needs no special code to disable/ignore "options trust-ad".

you are right, i forgot about that.
 
> I have no idea how unwind/libunbound behaves when using forwarders, the
> "accept bogus" feature, (opportunistic) DoT or other stuff I'm probably
> overlooking.  But AFAICS using unwind + "options trust-ad" brings no
> problem.  cc'ing Florian, input welcome.
> 
> I think the issue with trust-ad is the list of resolvers that end up in
> resolv.conf.  I would expect people who use a resolver on localhost to
> also have other resolvers listed in their /etc/resolv.conf, because of
> a conscious choice (bsd.rd) or just because it should be more reliable.
> It seems to me that the resolv.conf(5) manpage addition should cover
> that.  Or should I make it more explicit?

I think its a good start.

Fwiw i dont have other resolvers except localhost in resolv.conf, on laptops
(all running unwind) as well as my gateway (running unbound). (Server type
systems are a different thing of course). If the local resolver does not
work, i want to notice it right away.

> 
> Index: share/man/man5/resolv.conf.5
> ===
> RCS file: /d/cvs/src/share/man/man5/resolv.conf.5,v
> retrieving revision 1.60
> diff -u -p -r1.60 resolv.conf.5
> --- share/man/man5/resolv.conf.5  25 Apr 2020 14:22:04 -  1.60
> +++ share/man/man5/resolv.conf.5  25 Jul 2020 02:08:17 -
> @@ -285,6 +285,15 @@ first as an absolute name before any sea
>  .It Cm tcp
>  Forces the use of TCP for queries.
>  Normal behaviour is to query via UDP but fall back to TCP on failure.
> +.It Cm trust-ad
> +Sets the AD flag in DNS queries, and preserve the value of the AD flag
> +in DNS replies (this flag is stripped by default).
> +Useful when applications want to ensure that the DNS resources they
> +look up have been signed with DNSSEC and validated by the
> +.Ic nameserver .
> +This option should be used only if the transport between the
> +.Ic nameserver
> +and the resolver is secure.
>  .El
>  .El
>  .Pp
> 
> Lazy me would be tempted to just let asr trust 127.0.0.1 / ::1 by
> default and let others deal with the edge cases, but where is the fun in
> that... :)
> 
> -- 
> jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE
> 



Re: hostname.if '!' commands and rdomains

2020-07-29 Thread Kapetanakis Giannis
On 29/07/2020 12:54, Matthieu Herrb wrote:
> Hi,
>
> When I'm configuring an interface with a spécific rdomain, I'd assume
> that '!' commands (especially /sbin/route commands) are executed in
> the rdomain for this interface.
>
> I know that parsing this file is complex and somehow fragile but still
> I tried to write a patch.
>
> What do you think ?
>
> Of course I'm ok with any enhancements / fixes to my shell foo.
>
> --- netstart.orig Wed Jul 29 11:19:53 2020
> +++ netstart  Wed Jul 29 11:52:39 2020
> @@ -67,8 +67,16 @@
>   _cmds[${#_cmds[*]}]="ifconfig $_if ${_c[@]} up;dhclient $_if"
>   V4_DHCPCONF=true
>   ;;
> + rdomain) ((${#_c[*]} == 2)) || return
> + _cmds[${#_cmds[*]}]="ifconfig $_if rdomain ${_c[_name]}"
> + _rdomain=${_c[_name]}
> + ;;
>   '!'*)   _cmd=$(print -- "${_c[@]}" | sed 's/\$if/'$_if'/g')
> - _cmds[${#_cmds[*]}]="${_cmd#!}"
> + if [[ $_rdomain -ne 0 ]]; then
> +_cmds[${#_cmds[*]}]="/sbin/route -T$_rdomain exec 
> ${_cmd#!}"
> + else
> +_cmds[${#_cmds[*]}]="${_cmd#!}"
> + fi
>   ;;
>   *)  _cmds[${#_cmds[*]}]="ifconfig $_if ${_c[@]}"
>   ;;
>
Wouldn't this break those who already have
!route -T2 

in their hostname.if files?

G



Re: VFS vgone(l) manpage vs. code

2020-07-29 Thread Jason McIntyre
On Wed, Jul 29, 2020 at 11:37:03AM +, Schreilechner, Dominik wrote:
> Hi,
> 
> I noticed, that the description of vgone/vgonel in the manpage does not match 
> the code.
> The manpage (https://man.openbsd.org/vgone) states:
> The difference between vgone() and vgonel() is that vgone() locks the vnode 
> interlock and then calls vgonel() while vgonel() expects the interlock to 
> already be locked.
> 
> However, vgone simply calls vgonel with curproc (see vfs_subr.c).
> void
> vgone(struct vnode *vp)
> {
> struct proc *p = curproc;
> vgonel(vp, p);
> }
> 
> Best regards,
> Dominik
> 

hi.

it would be easier to get things improved if you attached a diff
describing what you think is correct. even failing a diff, a piece of
text saying what you think it should be would increase your chances.

jmc



Re: ssh(1), getrrsetbyname(3), SSHFP and DNSSEC

2020-07-29 Thread Jeremie Courreges-Anglas
On Sun, Jul 26 2020, Jeremie Courreges-Anglas  wrote:
> On Sat, Jul 25 2020, Sebastian Benoit  wrote:

[...]

>> If you enable trust-ad on a system that moves around, e.g. your laptop, you
>> will experience failures, which is why unwind tests for this and falls back
>> to non-trusting dhcp learned resolvers in such cases. In fact it also falls
>> back to the stub resolver, i.e. our libc resolver. Which makes me think that
>> trust-ad should not be used when you run unwind (because the whole point of
>> the fallback is that dnssec does not work). But thats a documentation issue.

One important thing to keep in mind: trust-ad is not so much about being
able to do DNSSEC validation.  It's about trusting the validation done
by the resolver(s).  So if the resolver can't do validation and ensures
that AD flag is clear in the reply there is no breach of contract.

> Thanks for pointing this out.  I would expect from unwind(8) that it
> always clears the AD bit from its responses if it could not validate
> them.  Inclusing when it falls back to dynamically learned resolvers or
> the libc resolver.

So I did a few tests and read some unwind(8) code, and it *appears* safe
to use unwind(8) along with "options trust-ad".

First you mention fallback to DHCP-learned resolvers.  Those you should
probably not trust indeed, but it looks like unwind(8) attempts to use
them to perform its own validation.  So the value of the AD flag in
unwind(8)'s response should be trustworthy.  At least that's what I see
when I test unwind with "preference { dhcp }".

And then there's the asr fallback, tested with "preference { stub }".
unwind(8) allocates its own asr context, ignoring the global
/etc/resolv.conf and thus "options trust-ad".  IIUC it then hands off
the answer to libunbound which cooks its own soup.  So it looks like
unwind needs no special code to disable/ignore "options trust-ad".

I have no idea how unwind/libunbound behaves when using forwarders, the
"accept bogus" feature, (opportunistic) DoT or other stuff I'm probably
overlooking.  But AFAICS using unwind + "options trust-ad" brings no
problem.  cc'ing Florian, input welcome.

I think the issue with trust-ad is the list of resolvers that end up in
resolv.conf.  I would expect people who use a resolver on localhost to
also have other resolvers listed in their /etc/resolv.conf, because of
a conscious choice (bsd.rd) or just because it should be more reliable.
It seems to me that the resolv.conf(5) manpage addition should cover
that.  Or should I make it more explicit?

Index: share/man/man5/resolv.conf.5
===
RCS file: /d/cvs/src/share/man/man5/resolv.conf.5,v
retrieving revision 1.60
diff -u -p -r1.60 resolv.conf.5
--- share/man/man5/resolv.conf.525 Apr 2020 14:22:04 -  1.60
+++ share/man/man5/resolv.conf.525 Jul 2020 02:08:17 -
@@ -285,6 +285,15 @@ first as an absolute name before any sea
 .It Cm tcp
 Forces the use of TCP for queries.
 Normal behaviour is to query via UDP but fall back to TCP on failure.
+.It Cm trust-ad
+Sets the AD flag in DNS queries, and preserve the value of the AD flag
+in DNS replies (this flag is stripped by default).
+Useful when applications want to ensure that the DNS resources they
+look up have been signed with DNSSEC and validated by the
+.Ic nameserver .
+This option should be used only if the transport between the
+.Ic nameserver
+and the resolver is secure.
 .El
 .El
 .Pp

Lazy me would be tempted to just let asr trust 127.0.0.1 / ::1 by
default and let others deal with the edge cases, but where is the fun in
that... :)

-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE



Re: ssh(1), getrrsetbyname(3), SSHFP and DNSSEC

2020-07-29 Thread Sebastian Benoit
Jeremie Courreges-Anglas(j...@wxcvbn.org) on 2020.07.26 22:52:47 +0200:
> On Sat, Jul 25 2020, Sebastian Benoit  wrote:
> > Jeremie Courreges-Anglas(j...@wxcvbn.org) on 2020.07.25 14:01:06 +0200:
> >> On Fri, Jul 17 2020, Jesper Wallin  wrote:
> >> > Hi all,
> >> >
> >> > I recently decided to add SSHFP records for my servers, since I never
> >> > memorize or write down my key fingerprints.  I learned that if I
> >> > want ssh(1) to trust these records, DNSSEC needs to be enabled for my
> >> > zone.  To validate these records, ssh(1) is using getrrsetbyname(3),
> >> > which checks if the AD (Authentic Data) flag is set in the response.
> >> >
> >> > To get a response with the AD flag set, the request itself also needs
> >> > to have the AD flag set.  It turns out that getrrsetbyname(3) doesn't
> >> > set this and will therefore never get a validated response, unless the
> >> > resolver is configured to always send it, no matter what the client
> >> > requested.
> >> >
> >> > It seems like unwind(8) behaves this way but it also responds with the
> >> > RRSIG records, which is extra overhead and ignored by getrrsetbyname(3).
> >> >
> >> > This was mentioned a few years ago [0] and the solution suggested was
> >> > to add the RES_USE_DNSSEC to _res.options, which also makes the resolver
> >> > respond with the extra RRSIG records.
> >> >
> >> > Instead, by only setting the AD flag, both the request and the response
> >> > has the same size as without the flag set.  The patch below will add
> >> > RES_USE_AD as an option to _res.options and set it by default.
> >> > This is also the default behaviour in dig(1), which I understand is a
> >> > bit different, but that sure added some confusion while debugging this.
> >> >
> >> > This let you run unbound(8) or any other validating resolver on your
> >> > local network and getrrsetbyname(3) will trust it.  Do read the CAVEATS
> >> > in the manual of getrrsetbyname(3) though.
> >> >
> >> > As a side note, I noticed that the default value of _res.options was the
> >> > same value as RES_DEFAULT, so I changed it to RES_DEFAULT instead, for
> >> > the sake of consistency.
> >> >
> >> > Thoughts?
> >> 
> >> Thanks for addressing this longstanding issue.
> >> 
> >> I think using the AD bit in queries is a good idea. IIRC Peter J.
> >> Philipp (cc'd) suggested using it but I was not thrilled because:
> >> 
> >> 1. the meaning of the AD bit in queries is relatively recent (2013
> >>   I think[0])
> >> 2. getrrsetbyname also collects signature records, and for this you need
> >>   EDNS + the DO bit set.  Implementing this in was not 100% trivial,
> >>   I think we had something working but Eric or I were not 100% happy
> >>   with it.
> >> 
> >> 1. is probably not a concern, after all you're supposed to use
> >> a trustworthy resolver, which should be modern and understand the
> >> purpose of the AD bit in queries.
> >> 
> >> 2. is probably not a concern either.  I guess that all getrrsetbyname(3)
> >> callers only care about the target records and the AD bit, not about the
> >> signature records.  (Why would they use it for anyway?).  In the base
> >> system, only ssh(1) and traceroute(8) use getrrsetbyname(3).
> >> AFAIK no other system provides getrrsetbyname(3), and ISC has removed
> >> getrrsetbyname and the whole lwres API in 2017[1].  So I'd say we're
> >> free to improve our version of getrrsetbyname(3) as we see fit.
> >
> > This is a concern for the stub resolver, because edns and AD does not work
> > everywhere.
> > Indeed when we switched unbound to validate by default we
> > learned that this is not a good idea for everyone - which lead to the
> > development of unwind(8).
> 
> Do you by chance have any data regarding fallout because of the AD bit
> set in queries?  I would expect it to be ignored when not supported.
> EDNS and the DNSSEC DO bit is a different story indeed.

If i remember correctly, the fallout was caused by EDNS but i might be
wrong. The unbound commit caused a developer some headscratching, because
his upstream internet did not work with such packets, which led to immediate
backout of the change, because a default config that does not work is not
good.

In that regard i worry about the resolv.conf option: it has the potential to
break peoples DNS in non obvious ways. At least dont advertise it as a way
to "enable dnssec" for the whole system, instead point people to unwind.



Re: hostname.if '!' commands and rdomains

2020-07-29 Thread Klemens Nanni
On Wed, Jul 29, 2020 at 11:54:17AM +0200, Matthieu Herrb wrote:
> When I'm configuring an interface with a spécific rdomain, I'd assume
> that '!' commands (especially /sbin/route commands) are executed in
> the rdomain for this interface.
I see where you're coming from, but the diff seems flawed.

I don't have a better approach for this at hand, but as it stands, I'm
happy with having to do `route -T1 exec' manually if need be.

> I know that parsing this file is complex and somehow fragile but still
> I tried to write a patch.
In any way, /usr/src/distrib/miniroot/install.sub:parse_hn_line() will
require the same treatment to keep netstart(8) and installer in sync.

Also, if something like this went in, I'd like to see hostname.if(5)
explicitly document `!' behaviour wrt. to the routing domain.

> --- netstart.orig Wed Jul 29 11:19:53 2020
> +++ netstart  Wed Jul 29 11:52:39 2020
> @@ -67,8 +67,16 @@
>   _cmds[${#_cmds[*]}]="ifconfig $_if ${_c[@]} up;dhclient $_if"
>   V4_DHCPCONF=true
>   ;;
> + rdomain) ((${#_c[*]} == 2)) || return
> + _cmds[${#_cmds[*]}]="ifconfig $_if rdomain ${_c[_name]}"
> + _rdomain=${_c[_name]}
> + ;;
This assumes `rdomain' is set on its own hostname.if(5) line, so config
like "up rdomain 1" won't set `_rdomain' here...

>   '!'*)   _cmd=$(print -- "${_c[@]}" | sed 's/\$if/'$_if'/g')
> - _cmds[${#_cmds[*]}]="${_cmd#!}"
> + if [[ $_rdomain -ne 0 ]]; then
> +_cmds[${#_cmds[*]}]="/sbin/route -T$_rdomain exec 
> ${_cmd#!}"
> + else
> +_cmds[${#_cmds[*]}]="${_cmd#!}"
> + fi
>   ;;
and that causes the interface to be in rdomain 1 while executing
in rdomain 0.



Re: ssh(1), getrrsetbyname(3), SSHFP and DNSSEC

2020-07-29 Thread Jesper Wallin
On Wed, Jul 29, 2020 at 02:46:06PM +0200, Jeremie Courreges-Anglas wrote:
> On Mon, Jul 27 2020, Jesper Wallin  wrote:
> 
> [...]
> 
> > I still think the RES_USE_AD option might be a useful though, for when
> > you want to decide on an application-level whether to trust AD or not?
> 
> RES_TRUSTAD can also be used for this, but as the proposed documentation
> points out it would be better to let the admin decide.  One
> (dis)advantage of using that name is that some external applications
> already grok it (eg ports/mail/postfix).
> 

Oh, of course! My bad. *facepalm*

I decided to use RES_USE_AD since the last 4 options in res_init(3) was
using that naming convension.  Also, it would be silly to "use ad"
without actually trusting it? :-)



Re: ssh(1), getrrsetbyname(3), SSHFP and DNSSEC

2020-07-29 Thread Jeremie Courreges-Anglas
On Mon, Jul 27 2020, Jesper Wallin  wrote:

[...]

> I still think the RES_USE_AD option might be a useful though, for when
> you want to decide on an application-level whether to trust AD or not?

RES_TRUSTAD can also be used for this, but as the proposed documentation
points out it would be better to let the admin decide.  One
(dis)advantage of using that name is that some external applications
already grok it (eg ports/mail/postfix).

-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE



ifconfig: merge switch_status() into bridge_status()

2020-07-29 Thread Klemens Nanni
This is to reduce duplicate code and pave the way for a single
bridge_status() that covers all bridge like interfaces: bridge(4),
switch(4) and tpmr(4).

Feedback? OK?

Index: brconfig.c
===
RCS file: /cvs/src/sbin/ifconfig/brconfig.c,v
retrieving revision 1.26
diff -u -p -r1.26 brconfig.c
--- brconfig.c  29 Jul 2020 12:13:28 -  1.26
+++ brconfig.c  29 Jul 2020 12:17:04 -
@@ -54,6 +54,7 @@ void bridge_ifclrflag(const char *, u_in
 
 void bridge_list(char *);
 void bridge_cfg(const char *);
+void switch_cfg(const char *);
 void bridge_badrule(int, char **, int);
 void bridge_showrule(struct ifbrlreq *);
 int is_switch(void);
@@ -778,10 +779,16 @@ void
 bridge_status(void)
 {
struct ifbrparam bp1, bp2;
+   int isswitch = is_switch();
 
-   if (!is_bridge() || is_switch())
+   if (!is_bridge())
return;
 
+   if (isswitch)
+   switch_cfg("\t");
+   else
+   bridge_cfg("\t");
+
bridge_cfg("\t");
 
bridge_list("\t");
@@ -789,6 +796,9 @@ bridge_status(void)
if (aflag && !ifaliases)
return;
 
+   if (isswitch)
+   return;
+
strlcpy(bp1.ifbrp_name, ifname, sizeof(bp1.ifbrp_name));
if (ioctl(sock, SIOCBRDGGCACHE, (caddr_t)) == -1)
return;
@@ -1146,8 +1156,8 @@ is_switch()
return (1);
 }
 
-static void
-switch_cfg(char *delim)
+void
+switch_cfg(const char *delim)
 {
struct ifbrparam bp;
 
@@ -1168,20 +1178,6 @@ switch_cfg(char *delim)
err(1, "%s", ifname);
 
printf(" maxgroup %d\n", bp.ifbrp_maxgroup);
-}
-
-void
-switch_status(void)
-{
-   if (!is_switch())
-   return;
-
-   switch_cfg("\t");
-
-   bridge_list("\t");
-
-   if (aflag && !ifaliases)
-   return;
 }
 
 void
Index: ifconfig.c
===
RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v
retrieving revision 1.424
diff -u -p -r1.424 ifconfig.c
--- ifconfig.c  3 Jul 2020 17:42:50 -   1.424
+++ ifconfig.c  29 Jul 2020 12:17:06 -
@@ -3507,7 +3507,6 @@ status(int link, struct sockaddr_dl *sdl
phys_status(0);
 #ifndef SMALL
bridge_status();
-   switch_status();
 #endif
 }
 
Index: ifconfig.h
===
RCS file: /cvs/src/sbin/ifconfig/ifconfig.h,v
retrieving revision 1.2
diff -u -p -r1.2 ifconfig.h
--- ifconfig.h  24 Oct 2019 18:54:10 -  1.2
+++ ifconfig.h  29 Jul 2020 12:17:06 -
@@ -69,7 +69,6 @@ void bridge_flushrule(const char *, int)
 int is_bridge(void);
 void bridge_status(void);
 int bridge_rule(int, char **, int);
-void switch_status(void);
 void switch_datapathid(const char *, int);
 void switch_portno(const char *, const char *);
 



VFS vgone(l) manpage vs. code

2020-07-29 Thread Schreilechner, Dominik
Hi,

I noticed, that the description of vgone/vgonel in the manpage does not match 
the code.
The manpage (https://man.openbsd.org/vgone) states:
The difference between vgone() and vgonel() is that vgone() locks the vnode 
interlock and then calls vgonel() while vgonel() expects the interlock to 
already be locked.

However, vgone simply calls vgonel with curproc (see vfs_subr.c).
void
vgone(struct vnode *vp)
{
struct proc *p = curproc;
vgonel(vp, p);
}

Best regards,
Dominik



Re: acpicpu(4) and ACPI0007

2020-07-29 Thread Jonathan Matthew
On Wed, Jul 29, 2020 at 10:06:14AM +0200, Mark Kettenis wrote:
> > Date: Wed, 29 Jul 2020 10:38:55 +1000
> > From: Jonathan Matthew 
> > 
> > On Tue, Jul 28, 2020 at 07:30:36PM +0200, Mark Kettenis wrote:
> > > > Date: Tue, 28 Jul 2020 21:42:46 +1000
> > > > From: Jonathan Matthew 
> > > > 
> > > > On Tue, Jul 28, 2020 at 11:12:21AM +0200, Mark Kettenis wrote:
> > > > > > Date: Tue, 28 Jul 2020 13:46:34 +1000
> > > > > > From: Jonathan Matthew 
> > > > > > 
> > > > > > On Mon, Jul 27, 2020 at 05:16:47PM +0200, Mark Kettenis wrote:
> > > > > > > > Date: Mon, 27 Jul 2020 17:02:41 +0200 (CEST)
> > > > > > > > From: Mark Kettenis 
> > > > > > > > 
> > > > > > > > Recent ACPI versions have deprecated "Processor()" nodes in 
> > > > > > > > favout of
> > > > > > > > "Device()" nodes with a _HID() method that returns "ACPI0007".  
> > > > > > > > This
> > > > > > > > diff tries to support machines with firmware that implements 
> > > > > > > > this.  If
> > > > > > > > you see something like:
> > > > > > > > 
> > > > > > > >   "ACPI0007" at acpi0 not configured
> > > > > > > > 
> > > > > > > > please try the following diff and report back with an updated 
> > > > > > > > dmesg.
> > > > > > > > 
> > > > > > > > Cheers,
> > > > > > > > 
> > > > > > > > Mark
> > > > > > > 
> > > > > > > And now with the right diff...
> > > > > > 
> > > > > > On a dell r6415, it looks like this:
> > > > > > 
> > > > > > acpicpu0 at acpi0copyvalue: 6: C1(@1 halt!)
> > > > > > all the way up to
> > > > > > acpicpu127 at acpi0copyvalue: 6: no cpu matching ACPI ID 127
> > > > > > 
> > > > > > which I guess means aml_copyvalue() needs to learn how to copy 
> > > > > > AML_OBJTYPE_DEVICE.
> > > > > 
> > > > > Yes.  It is not immediately obvious how this should work.  Do we need
> > > > > to copy the aml_node pointer or not?  We don't do that for
> > > > > AML_OBJTYPE_PROCESSOR and AML_OBJTYPE_POWERRSRC types which are
> > > > > similar to AML_OBJTYPE_DEVICE.  But AML_OBJTYPE_DEVICE object don't
> > > > > carry any additional information.  So we end up with just an empty
> > > > > case to avoid the warning.
> > > > > 
> > > > > Does this work on the Dell machines?
> > > > 
> > > > We've seen crashes in pool_cache_get() in various places after all the 
> > > > acpicpus
> > > > attach, which we haven't seen before on these machines, so I think it's
> > > > corrupting memory somehow.
> > > 
> > > Does that happen with only the acpicpu(4) diff?
> > 
> > Yes.  Looking at this a bit more, in the case where aml_evalnode() can't
> > copy the result value, it leaves it uninitialised, which means we'll call
> > aml_freevalue() where res is stack junk.  memset(, 0, sizeof(res))
> > seems to fix it.
> 
> Eh, where exactly?

I had it just before the call to aml_evalnode(), but that can't be it,
since aml_evalnode() does the same thing.

> 
> > > > With this addition, we get this for each cpu:
> > > > acpicpu0 at acpi0: C1(@1 halt!)
> > > 
> > > The exclamation mark indicates that this is the "fallback" C-state.
> > > Is there a _CST method at all?
> > > 
> > > Anyway, given that this is a server system, it isn't really surprising
> > > that there isn't any fancy power saving stuff.
> > 
> > Right, there doesn't seem to be any.  The processor devices look like this
> > in the aml:
> > 
> > Scope (_SB)
> > {
> > Device (C000)
> > {
> > Name (_HID, "ACPI0007" /* Processor Device */)  // _HID: 
> > Hardware ID
> > Name (_UID, 0x00)  // _UID: Unique ID
> > }
> > 
> > Device (C001)
> > {
> > Name (_HID, "ACPI0007" /* Processor Device */)  // _HID: 
> > Hardware ID
> > Name (_UID, 0x01)  // _UID: Unique ID
> > }
> > 
> >  .. and so on.
> 
> Usually there is an SSDT that fills in the details.  The acpidump
> output I have for the r6415 does have one. but it doesn't add
> anything.

Same here.

> 
> > > > > Index: dev/acpi/dsdt.c
> > > > > ===
> > > > > RCS file: /cvs/src/sys/dev/acpi/dsdt.c,v
> > > > > retrieving revision 1.252
> > > > > diff -u -p -r1.252 dsdt.c
> > > > > --- dev/acpi/dsdt.c   21 Jul 2020 03:48:06 -  1.252
> > > > > +++ dev/acpi/dsdt.c   28 Jul 2020 09:04:15 -
> > > > > @@ -996,6 +996,8 @@ aml_copyvalue(struct aml_value *lhs, str
> > > > >   lhs->v_objref = rhs->v_objref;
> > > > >   aml_addref(lhs->v_objref.ref, "");
> > > > >   break;
> > > > > + case AML_OBJTYPE_DEVICE:
> > > > > + break;
> > > > >   default:
> > > > >   printf("copyvalue: %x", rhs->type);
> > > > >   break;
> > > > 
> > > > 
> > 



hostname.if '!' commands and rdomains

2020-07-29 Thread Matthieu Herrb
Hi,

When I'm configuring an interface with a spécific rdomain, I'd assume
that '!' commands (especially /sbin/route commands) are executed in
the rdomain for this interface.

I know that parsing this file is complex and somehow fragile but still
I tried to write a patch.

What do you think ?

Of course I'm ok with any enhancements / fixes to my shell foo.

--- netstart.orig   Wed Jul 29 11:19:53 2020
+++ netstartWed Jul 29 11:52:39 2020
@@ -67,8 +67,16 @@
_cmds[${#_cmds[*]}]="ifconfig $_if ${_c[@]} up;dhclient $_if"
V4_DHCPCONF=true
;;
+   rdomain) ((${#_c[*]} == 2)) || return
+   _cmds[${#_cmds[*]}]="ifconfig $_if rdomain ${_c[_name]}"
+   _rdomain=${_c[_name]}
+   ;;
'!'*)   _cmd=$(print -- "${_c[@]}" | sed 's/\$if/'$_if'/g')
-   _cmds[${#_cmds[*]}]="${_cmd#!}"
+   if [[ $_rdomain -ne 0 ]]; then
+  _cmds[${#_cmds[*]}]="/sbin/route -T$_rdomain exec 
${_cmd#!}"
+   else
+  _cmds[${#_cmds[*]}]="${_cmd#!}"
+   fi
;;
*)  _cmds[${#_cmds[*]}]="ifconfig $_if ${_c[@]}"
;;

-- 
Matthieu Herrb



Re: acpicpu(4) and ACPI0007

2020-07-29 Thread Mark Kettenis
> Date: Wed, 29 Jul 2020 10:38:55 +1000
> From: Jonathan Matthew 
> 
> On Tue, Jul 28, 2020 at 07:30:36PM +0200, Mark Kettenis wrote:
> > > Date: Tue, 28 Jul 2020 21:42:46 +1000
> > > From: Jonathan Matthew 
> > > 
> > > On Tue, Jul 28, 2020 at 11:12:21AM +0200, Mark Kettenis wrote:
> > > > > Date: Tue, 28 Jul 2020 13:46:34 +1000
> > > > > From: Jonathan Matthew 
> > > > > 
> > > > > On Mon, Jul 27, 2020 at 05:16:47PM +0200, Mark Kettenis wrote:
> > > > > > > Date: Mon, 27 Jul 2020 17:02:41 +0200 (CEST)
> > > > > > > From: Mark Kettenis 
> > > > > > > 
> > > > > > > Recent ACPI versions have deprecated "Processor()" nodes in 
> > > > > > > favout of
> > > > > > > "Device()" nodes with a _HID() method that returns "ACPI0007".  
> > > > > > > This
> > > > > > > diff tries to support machines with firmware that implements 
> > > > > > > this.  If
> > > > > > > you see something like:
> > > > > > > 
> > > > > > >   "ACPI0007" at acpi0 not configured
> > > > > > > 
> > > > > > > please try the following diff and report back with an updated 
> > > > > > > dmesg.
> > > > > > > 
> > > > > > > Cheers,
> > > > > > > 
> > > > > > > Mark
> > > > > > 
> > > > > > And now with the right diff...
> > > > > 
> > > > > On a dell r6415, it looks like this:
> > > > > 
> > > > > acpicpu0 at acpi0copyvalue: 6: C1(@1 halt!)
> > > > > all the way up to
> > > > > acpicpu127 at acpi0copyvalue: 6: no cpu matching ACPI ID 127
> > > > > 
> > > > > which I guess means aml_copyvalue() needs to learn how to copy 
> > > > > AML_OBJTYPE_DEVICE.
> > > > 
> > > > Yes.  It is not immediately obvious how this should work.  Do we need
> > > > to copy the aml_node pointer or not?  We don't do that for
> > > > AML_OBJTYPE_PROCESSOR and AML_OBJTYPE_POWERRSRC types which are
> > > > similar to AML_OBJTYPE_DEVICE.  But AML_OBJTYPE_DEVICE object don't
> > > > carry any additional information.  So we end up with just an empty
> > > > case to avoid the warning.
> > > > 
> > > > Does this work on the Dell machines?
> > > 
> > > We've seen crashes in pool_cache_get() in various places after all the 
> > > acpicpus
> > > attach, which we haven't seen before on these machines, so I think it's
> > > corrupting memory somehow.
> > 
> > Does that happen with only the acpicpu(4) diff?
> 
> Yes.  Looking at this a bit more, in the case where aml_evalnode() can't
> copy the result value, it leaves it uninitialised, which means we'll call
> aml_freevalue() where res is stack junk.  memset(, 0, sizeof(res))
> seems to fix it.

Eh, where exactly?

> > > With this addition, we get this for each cpu:
> > > acpicpu0 at acpi0: C1(@1 halt!)
> > 
> > The exclamation mark indicates that this is the "fallback" C-state.
> > Is there a _CST method at all?
> > 
> > Anyway, given that this is a server system, it isn't really surprising
> > that there isn't any fancy power saving stuff.
> 
> Right, there doesn't seem to be any.  The processor devices look like this
> in the aml:
> 
> Scope (_SB)
> {
> Device (C000)
> {
> Name (_HID, "ACPI0007" /* Processor Device */)  // _HID: Hardware 
> ID
> Name (_UID, 0x00)  // _UID: Unique ID
> }
> 
> Device (C001)
> {
> Name (_HID, "ACPI0007" /* Processor Device */)  // _HID: Hardware 
> ID
> Name (_UID, 0x01)  // _UID: Unique ID
> }
> 
>  .. and so on.

Usually there is an SSDT that fills in the details.  The acpidump
output I have for the r6415 does have one. but it doesn't add
anything.

> > > > Index: dev/acpi/dsdt.c
> > > > ===
> > > > RCS file: /cvs/src/sys/dev/acpi/dsdt.c,v
> > > > retrieving revision 1.252
> > > > diff -u -p -r1.252 dsdt.c
> > > > --- dev/acpi/dsdt.c 21 Jul 2020 03:48:06 -  1.252
> > > > +++ dev/acpi/dsdt.c 28 Jul 2020 09:04:15 -
> > > > @@ -996,6 +996,8 @@ aml_copyvalue(struct aml_value *lhs, str
> > > > lhs->v_objref = rhs->v_objref;
> > > > aml_addref(lhs->v_objref.ref, "");
> > > > break;
> > > > +   case AML_OBJTYPE_DEVICE:
> > > > +   break;
> > > > default:
> > > > printf("copyvalue: %x", rhs->type);
> > > > break;
> > > 
> > > 
>