Re: tsleep(9): add global "nowake" channel
On Thu, Dec 24, 2020 at 12:24:01AM +0100, Patrick Wildt wrote: > Am Wed, Dec 23, 2020 at 05:04:23PM -0600 schrieb Scott Cheloha: > > On Wed, Dec 23, 2020 at 02:42:18PM -0700, Theo de Raadt wrote: > > > I agree. This chunk below is really gross and does not follow the > > > special wakeup channel metaphor. > > > > > > It is *entirely clear* that a called "nowake" has no wakeup. > > > Like duh. > > > > > > > +/* > > > > + * nowake is a global sleep channel for threads that do not want > > > > + * to receive wakeup(9) broadcasts. > > > > + */ > > > > +int __nowake; > > > > +void *nowake = &__nowake; > > > > So we'll go with this? > > > > Index: kern/kern_synch.c > > === > > RCS file: /cvs/src/sys/kern/kern_synch.c,v > > retrieving revision 1.172 > > diff -u -p -r1.172 kern_synch.c > > --- kern/kern_synch.c 7 Dec 2020 16:55:29 - 1.172 > > +++ kern/kern_synch.c 23 Dec 2020 23:03:31 - > > @@ -87,6 +87,11 @@ sleep_queue_init(void) > > TAILQ_INIT([i]); > > } > > > > +/* > > + * Global sleep channel for threads that do not want to > > + * receive wakeup(9) broadcasts. > > + */ > > +int nowake; > > > > /* > > * During autoconfiguration or after a panic, a sleep will simply > > @@ -119,6 +124,7 @@ tsleep(const volatile void *ident, int p > > #endif > > > > KASSERT((priority & ~(PRIMASK | PCATCH)) == 0); > > + KASSERT(ident != nowake || ISSET(priority, PCATCH) || timo != 0); > > Sure you compiled this? ident is void *, nowake is int. Should be ident > != ? Same for the other code in the diff. Whoops. Thought I had compiled it. Yes, I meant Index: kern/kern_synch.c === RCS file: /cvs/src/sys/kern/kern_synch.c,v retrieving revision 1.172 diff -u -p -r1.172 kern_synch.c --- kern/kern_synch.c 7 Dec 2020 16:55:29 - 1.172 +++ kern/kern_synch.c 23 Dec 2020 23:12:10 - @@ -87,6 +87,11 @@ sleep_queue_init(void) TAILQ_INIT([i]); } +/* + * Global sleep channel for threads that do not want to + * receive wakeup(9) broadcasts. + */ +int nowake; /* * During autoconfiguration or after a panic, a sleep will simply @@ -119,6 +124,7 @@ tsleep(const volatile void *ident, int p #endif KASSERT((priority & ~(PRIMASK | PCATCH)) == 0); + KASSERT(ident != || ISSET(priority, PCATCH) || timo != 0); #ifdef MULTIPROCESSOR KASSERT(timo || _kernel_lock_held()); @@ -213,6 +219,7 @@ msleep(const volatile void *ident, struc #endif KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); + KASSERT(ident != || ISSET(priority, PCATCH) || timo != 0); KASSERT(mtx != NULL); if (priority & PCATCH) @@ -301,6 +308,7 @@ rwsleep(const volatile void *ident, stru int error, status; KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); + KASSERT(ident != || ISSET(priority, PCATCH) || timo != 0); rw_assert_anylock(rwl); status = rw_status(rwl); Index: sys/systm.h === RCS file: /cvs/src/sys/sys/systm.h,v retrieving revision 1.148 diff -u -p -r1.148 systm.h --- sys/systm.h 26 Aug 2020 03:29:07 - 1.148 +++ sys/systm.h 23 Dec 2020 23:12:10 - @@ -107,6 +107,8 @@ extern struct vnode *rootvp;/* vnode eq extern dev_t swapdev; /* swapping device */ extern struct vnode *swapdev_vp;/* vnode equivalent to above */ +extern int nowake; /* dead wakeup(9) channel */ + struct proc; struct process; #define curproc curcpu()->ci_curproc
Re: rasops1
Am Wed, Dec 23, 2020 at 11:32:58PM +0100 schrieb Frederic Cambus: > Hi Mark, > > On Fri, Dec 18, 2020 at 10:33:52PM +0100, Mark Kettenis wrote: > > > The diff below disables the optimized functions on little-endian > > architectures such that we always use rasops1_putchar(). This makes > > ssdfb(4) work with the default 8x16 font on arm64. > > I noticed it was committed already, but it seems the following > directives: > > #if defined(RASOPS_SMALL) && BYTE_ORDER == BIG_ENDIAN > > Should have been: > > #if !defined(RASOPS_SMALL) && BYTE_ORDER == BIG_ENDIAN > > We want to include the optimized putchar functions only if RASOPS_SMALL > is not defined. > True that. In one #endif comment he actually kept the !, but the actual ifs lost it.
Re: tsleep(9): add global "nowake" channel
Am Wed, Dec 23, 2020 at 05:04:23PM -0600 schrieb Scott Cheloha: > On Wed, Dec 23, 2020 at 02:42:18PM -0700, Theo de Raadt wrote: > > I agree. This chunk below is really gross and does not follow the > > special wakeup channel metaphor. > > > > It is *entirely clear* that a called "nowake" has no wakeup. > > Like duh. > > > > > +/* > > > + * nowake is a global sleep channel for threads that do not want > > > + * to receive wakeup(9) broadcasts. > > > + */ > > > +int __nowake; > > > +void *nowake = &__nowake; > > So we'll go with this? > > Index: kern/kern_synch.c > === > RCS file: /cvs/src/sys/kern/kern_synch.c,v > retrieving revision 1.172 > diff -u -p -r1.172 kern_synch.c > --- kern/kern_synch.c 7 Dec 2020 16:55:29 - 1.172 > +++ kern/kern_synch.c 23 Dec 2020 23:03:31 - > @@ -87,6 +87,11 @@ sleep_queue_init(void) > TAILQ_INIT([i]); > } > > +/* > + * Global sleep channel for threads that do not want to > + * receive wakeup(9) broadcasts. > + */ > +int nowake; > > /* > * During autoconfiguration or after a panic, a sleep will simply > @@ -119,6 +124,7 @@ tsleep(const volatile void *ident, int p > #endif > > KASSERT((priority & ~(PRIMASK | PCATCH)) == 0); > + KASSERT(ident != nowake || ISSET(priority, PCATCH) || timo != 0); Sure you compiled this? ident is void *, nowake is int. Should be ident != ? Same for the other code in the diff. > > #ifdef MULTIPROCESSOR > KASSERT(timo || _kernel_lock_held()); > @@ -213,6 +219,7 @@ msleep(const volatile void *ident, struc > #endif > > KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); > + KASSERT(ident != nowake || ISSET(priority, PCATCH) || timo != 0); > KASSERT(mtx != NULL); > > if (priority & PCATCH) > @@ -301,6 +308,7 @@ rwsleep(const volatile void *ident, stru > int error, status; > > KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); > + KASSERT(ident != nowake || ISSET(priority, PCATCH) || timo != 0); > rw_assert_anylock(rwl); > status = rw_status(rwl); > > Index: sys/systm.h > === > RCS file: /cvs/src/sys/sys/systm.h,v > retrieving revision 1.148 > diff -u -p -r1.148 systm.h > --- sys/systm.h 26 Aug 2020 03:29:07 - 1.148 > +++ sys/systm.h 23 Dec 2020 23:03:31 - > @@ -107,6 +107,8 @@ extern struct vnode *rootvp; /* vnode eq > extern dev_t swapdev;/* swapping device */ > extern struct vnode *swapdev_vp;/* vnode equivalent to above */ > > +extern int nowake; /* dead wakeup(9) channel */ > + > struct proc; > struct process; > #define curproc curcpu()->ci_curproc >
Re: tsleep(9): add global "nowake" channel
On Wed, Dec 23, 2020 at 02:42:18PM -0700, Theo de Raadt wrote: > I agree. This chunk below is really gross and does not follow the > special wakeup channel metaphor. > > It is *entirely clear* that a called "nowake" has no wakeup. > Like duh. > > > +/* > > + * nowake is a global sleep channel for threads that do not want > > + * to receive wakeup(9) broadcasts. > > + */ > > +int __nowake; > > +void *nowake = &__nowake; So we'll go with this? Index: kern/kern_synch.c === RCS file: /cvs/src/sys/kern/kern_synch.c,v retrieving revision 1.172 diff -u -p -r1.172 kern_synch.c --- kern/kern_synch.c 7 Dec 2020 16:55:29 - 1.172 +++ kern/kern_synch.c 23 Dec 2020 23:03:31 - @@ -87,6 +87,11 @@ sleep_queue_init(void) TAILQ_INIT([i]); } +/* + * Global sleep channel for threads that do not want to + * receive wakeup(9) broadcasts. + */ +int nowake; /* * During autoconfiguration or after a panic, a sleep will simply @@ -119,6 +124,7 @@ tsleep(const volatile void *ident, int p #endif KASSERT((priority & ~(PRIMASK | PCATCH)) == 0); + KASSERT(ident != nowake || ISSET(priority, PCATCH) || timo != 0); #ifdef MULTIPROCESSOR KASSERT(timo || _kernel_lock_held()); @@ -213,6 +219,7 @@ msleep(const volatile void *ident, struc #endif KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); + KASSERT(ident != nowake || ISSET(priority, PCATCH) || timo != 0); KASSERT(mtx != NULL); if (priority & PCATCH) @@ -301,6 +308,7 @@ rwsleep(const volatile void *ident, stru int error, status; KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); + KASSERT(ident != nowake || ISSET(priority, PCATCH) || timo != 0); rw_assert_anylock(rwl); status = rw_status(rwl); Index: sys/systm.h === RCS file: /cvs/src/sys/sys/systm.h,v retrieving revision 1.148 diff -u -p -r1.148 systm.h --- sys/systm.h 26 Aug 2020 03:29:07 - 1.148 +++ sys/systm.h 23 Dec 2020 23:03:31 - @@ -107,6 +107,8 @@ extern struct vnode *rootvp;/* vnode eq extern dev_t swapdev; /* swapping device */ extern struct vnode *swapdev_vp;/* vnode equivalent to above */ +extern int nowake; /* dead wakeup(9) channel */ + struct proc; struct process; #define curproc curcpu()->ci_curproc
Re: rasops1
Hi Mark, On Fri, Dec 18, 2020 at 10:33:52PM +0100, Mark Kettenis wrote: > The diff below disables the optimized functions on little-endian > architectures such that we always use rasops1_putchar(). This makes > ssdfb(4) work with the default 8x16 font on arm64. I noticed it was committed already, but it seems the following directives: #if defined(RASOPS_SMALL) && BYTE_ORDER == BIG_ENDIAN Should have been: #if !defined(RASOPS_SMALL) && BYTE_ORDER == BIG_ENDIAN We want to include the optimized putchar functions only if RASOPS_SMALL is not defined.
Re: acme-client(1) make -F flag use more obvious
On 2020/12/15 17:49, Florian Obser wrote: > > > On 15 December 2020 14:56:41 CET, Stuart Henderson > wrote: > >On 2020/12/15 10:18, Solene Rapenne wrote: > >> This is a small change to acme-client(1) because I find > >> the explanation of -F flag not being obvious that you > >> need it when you add/remove an alternative name in your > >> domain config. > > > >This only works directly for adding. For removal you need to rm > >the existing certificate. > > -F only handles forced renewals correctly. > That it can be (ab)used to add subject alt names to a cert is an > implementation detail. > > It would be nice if someone™ would fix this properly by acme-client detecting > that cert and config do not agree anymore. like this perhaps? if we don't want to do this automatically for some reason, then we should at least extend beck's recent change so that -F handles names that are _removed_ from config, not just added as he did. Index: revokeproc.c === RCS file: /cvs/src/usr.sbin/acme-client/revokeproc.c,v retrieving revision 1.16 diff -u -p -r1.16 revokeproc.c --- revokeproc.c18 Nov 2020 20:54:43 - 1.16 +++ revokeproc.c23 Dec 2020 22:20:43 - @@ -202,7 +202,9 @@ revokeproc(int fd, const char *certfile, if (san == NULL) { warnx("%s: does not have a SAN entry", certfile); - goto out; + if (revocate) + goto out; + force = 2; } /* An array of buckets: the number of entries found. */ @@ -230,20 +232,29 @@ revokeproc(int fd, const char *certfile, if (strcmp(tok, alts[j]) == 0) break; if (j == altsz) { - warnx("%s: unknown SAN entry: %s", certfile, tok); - goto out; + if (revocate) { + warnx("%s: unknown SAN entry: %s", certfile, tok); + goto out; + } + force = 2; } if (found[j]++) { - warnx("%s: duplicate SAN entry: %s", certfile, tok); - goto out; + if (revocate) { + warnx("%s: duplicate SAN entry: %s", certfile, tok); + goto out; + } + force = 2; } } - for (j = 0; !force && j < altsz; j++) { + for (j = 0; j < altsz; j++) { if (found[j]) continue; - warnx("%s: domain not listed: %s", certfile, alts[j]); - goto out; + if (revocate) { + warnx("%s: domain not listed: %s", certfile, alts[j]); + goto out; + } + force = 2; } /* @@ -294,7 +305,8 @@ revokeproc(int fd, const char *certfile, certfile, (long long)(t - time(NULL)) / 24 / 60 / 60); if (rop == REVOKE_OK && force) { - warnx("%s: forcing renewal", certfile); + warnx("%s: %sforcing renewal", certfile, + force == 2 ? "domain list changed, " : ""); rop = REVOKE_EXP; } Index: acme-client.1 === RCS file: /cvs/src/usr.sbin/acme-client/acme-client.1,v retrieving revision 1.38 diff -u -p -r1.38 acme-client.1 --- acme-client.1 19 Dec 2020 18:05:44 - 1.38 +++ acme-client.1 23 Dec 2020 22:20:43 - @@ -67,10 +67,8 @@ location "/.well-known/acme-challenge/*" The options are as follows: .Bl -tag -width Ds .It Fl F -Force certificate renewal, even if it's too soon. -This is required if new domain alternative names -were added to -.Xr acme-client.conf 5 . +Force certificate renewal, even if it has more than 30 days +validity. .It Fl f Ar configfile Specify an alternative configuration file. .It Fl n
Re: acme-client(1): fulfil all challenges, then tell the the CA
On 2020/12/23 18:09, Florian Obser wrote: > First fulfil all challenges then tell the CA that it should check. > > With a CSR with multiple SANs acme-client would write one challenge, > tell the CA, write the next challenge and so on. > > For http-01 this doesn't matter but I think this will be nicer for dns-01 > because there are propagation delays to consider. > > Please be extra careful checking this. If I mess this up people might > run into renewal issues months from now. And when that happens people > tend to comment... (Which I also pull this out of the big diff I'm > currently working on for dns-01.) > > OK? I tested by forcibly renewing some multi-name certificates. I saw that letsencrypt didn't bother re-challenging because they already had a recent auth so I moved them to buypass, all looks good. (FWIW I did some ecdsa as well as rsa, not that it matters for this test). Reads good and works for me, OK. > diff --git netproc.c netproc.c > index 38732a4dd01..7c502643acc 100644 > --- netproc.c > +++ netproc.c > @@ -840,7 +840,12 @@ netproc(int kfd, int afd, int Cfd, int cfd, int dfd, int > rfd, > if (readop(Cfd, COMM_CHNG_ACK) != CHNG_ACK) > goto out; > > - /* Write to the CA that it's ready. */ > + } > + /* Write to the CA that it's ready. */ > + for (i = 0; i < order.authsz; i++) { > + if (chngs[i].status == CHNG_VALID || > + chngs[i].status == CHNG_INVALID) > + continue; > if (!dochngresp(, [i])) > goto out; > } > > > -- > I'm not entirely sure you are real. >
Re: tsleep(9): add global "nowake" channel
I agree. This chunk below is really gross and does not follow the special wakeup channel metaphor. It is *entirely clear* that a called "nowake" has no wakeup. Like duh. > +/* > + * nowake is a global sleep channel for threads that do not want > + * to receive wakeup(9) broadcasts. > + */ > +int __nowake; > +void *nowake = &__nowake;
Re: tsleep(9): add global "nowake" channel
> Date: Wed, 23 Dec 2020 14:59:02 -0600 > From: Scott Cheloha > > Okay, let's try one more time. > > This patch adds a global sleep channel, "nowake", for sleeping threads > that don't want to receive wakeup(9) broadcasts. > > You use it like this: > > #include > > tsleep(nowake, ...); > > I've added additional assertions to tsleep, msleep, and rwsleep that > ensures that there is *some* way to wake up the thread. You need > either an ident that is not nowake, PCATCH, or a timeout. > > I prefer using indirection here to save a character when using it as > the ident, i.e. "nowake" is shorter than "". I disagree. is obvious and doesn't require the weird underscored symbol indirection. > I'll document it when it is used more broadly in the kernel. > > ok? > > Index: kern/kern_synch.c > === > RCS file: /cvs/src/sys/kern/kern_synch.c,v > retrieving revision 1.172 > diff -u -p -r1.172 kern_synch.c > --- kern/kern_synch.c 7 Dec 2020 16:55:29 - 1.172 > +++ kern/kern_synch.c 23 Dec 2020 20:56:35 - > @@ -87,6 +87,12 @@ sleep_queue_init(void) > TAILQ_INIT([i]); > } > > +/* > + * nowake is a global sleep channel for threads that do not want > + * to receive wakeup(9) broadcasts. > + */ > +int __nowake; > +void *nowake = &__nowake; > > /* > * During autoconfiguration or after a panic, a sleep will simply > @@ -119,6 +125,7 @@ tsleep(const volatile void *ident, int p > #endif > > KASSERT((priority & ~(PRIMASK | PCATCH)) == 0); > + KASSERT(ident != nowake || ISSET(priority, PCATCH) || timo != 0); > > #ifdef MULTIPROCESSOR > KASSERT(timo || _kernel_lock_held()); > @@ -213,6 +220,7 @@ msleep(const volatile void *ident, struc > #endif > > KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); > + KASSERT(ident != nowake || ISSET(priority, PCATCH) || timo != 0); > KASSERT(mtx != NULL); > > if (priority & PCATCH) > @@ -301,6 +309,7 @@ rwsleep(const volatile void *ident, stru > int error, status; > > KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); > + KASSERT(ident != nowake || ISSET(priority, PCATCH) || timo != 0); > rw_assert_anylock(rwl); > status = rw_status(rwl); > > Index: sys/systm.h > === > RCS file: /cvs/src/sys/sys/systm.h,v > retrieving revision 1.148 > diff -u -p -r1.148 systm.h > --- sys/systm.h 26 Aug 2020 03:29:07 - 1.148 > +++ sys/systm.h 23 Dec 2020 20:56:35 - > @@ -107,6 +107,8 @@ extern struct vnode *rootvp; /* vnode eq > extern dev_t swapdev;/* swapping device */ > extern struct vnode *swapdev_vp;/* vnode equivalent to above */ > > +extern void *nowake; /* dead wakeup(9) channel */ > + > struct proc; > struct process; > #define curproc curcpu()->ci_curproc > >
Re: i386: apm(4): apm_thread(): sleep without lbolt
On Tue, Dec 15, 2020 at 09:15:31AM -0300, Martin Pieuchot wrote: > On 11/12/20(Fri) 19:17, Scott Cheloha wrote: > > Here's another sleep that doesn't need lbolt. > > > > The idea here is to call apm_periodic_check() once a second. > > We can do that without lbolt. > > > > Is there some other address that would be more appropriate for this > > thread to sleep on? It doesn't look like any apm(4) code calls > > wakeup(9) on lbolt so I've just replaced with with a local channel. > > Note sure we want to grow the stack just for that. Any member of `sc', > or even `sc' itself if this doesn't conflict, could be used as wait > channel. Assuming we go ahead with the global nowake channel, is this ok? Index: apm.c === RCS file: /cvs/src/sys/arch/i386/i386/apm.c,v retrieving revision 1.125 diff -u -p -r1.125 apm.c --- apm.c 24 Jun 2020 22:03:40 - 1.125 +++ apm.c 23 Dec 2020 21:03:50 - @@ -909,7 +909,7 @@ apm_thread(void *v) rw_enter_write(>sc_lock); (void) apm_periodic_check(sc); rw_exit_write(>sc_lock); - tsleep_nsec(, PWAIT, "apmev", INFSLP); + tsleep_nsec(nowake, PWAIT, "apmev", SEC_TO_NSEC(1)); } }
tsleep(9): add global "nowake" channel
Okay, let's try one more time. This patch adds a global sleep channel, "nowake", for sleeping threads that don't want to receive wakeup(9) broadcasts. You use it like this: #include tsleep(nowake, ...); I've added additional assertions to tsleep, msleep, and rwsleep that ensures that there is *some* way to wake up the thread. You need either an ident that is not nowake, PCATCH, or a timeout. I prefer using indirection here to save a character when using it as the ident, i.e. "nowake" is shorter than "". I'll document it when it is used more broadly in the kernel. ok? Index: kern/kern_synch.c === RCS file: /cvs/src/sys/kern/kern_synch.c,v retrieving revision 1.172 diff -u -p -r1.172 kern_synch.c --- kern/kern_synch.c 7 Dec 2020 16:55:29 - 1.172 +++ kern/kern_synch.c 23 Dec 2020 20:56:35 - @@ -87,6 +87,12 @@ sleep_queue_init(void) TAILQ_INIT([i]); } +/* + * nowake is a global sleep channel for threads that do not want + * to receive wakeup(9) broadcasts. + */ +int __nowake; +void *nowake = &__nowake; /* * During autoconfiguration or after a panic, a sleep will simply @@ -119,6 +125,7 @@ tsleep(const volatile void *ident, int p #endif KASSERT((priority & ~(PRIMASK | PCATCH)) == 0); + KASSERT(ident != nowake || ISSET(priority, PCATCH) || timo != 0); #ifdef MULTIPROCESSOR KASSERT(timo || _kernel_lock_held()); @@ -213,6 +220,7 @@ msleep(const volatile void *ident, struc #endif KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); + KASSERT(ident != nowake || ISSET(priority, PCATCH) || timo != 0); KASSERT(mtx != NULL); if (priority & PCATCH) @@ -301,6 +309,7 @@ rwsleep(const volatile void *ident, stru int error, status; KASSERT((priority & ~(PRIMASK | PCATCH | PNORELOCK)) == 0); + KASSERT(ident != nowake || ISSET(priority, PCATCH) || timo != 0); rw_assert_anylock(rwl); status = rw_status(rwl); Index: sys/systm.h === RCS file: /cvs/src/sys/sys/systm.h,v retrieving revision 1.148 diff -u -p -r1.148 systm.h --- sys/systm.h 26 Aug 2020 03:29:07 - 1.148 +++ sys/systm.h 23 Dec 2020 20:56:35 - @@ -107,6 +107,8 @@ extern struct vnode *rootvp;/* vnode eq extern dev_t swapdev; /* swapping device */ extern struct vnode *swapdev_vp;/* vnode equivalent to above */ +extern void *nowake; /* dead wakeup(9) channel */ + struct proc; struct process; #define curproc curcpu()->ci_curproc
Re: i386 pmap diff
On 23.12.2020. 18:24, Mark Kettenis wrote: > Diff below switches the i386 pmap to use the modern km_alloc(9) > functions and uses IPL_VM for the pmap pool, following the example of > amd64. > > Don't have easy access to an i386 machine right now, so this has only > been compile tested. > > ok (if it works)? Hi, i've checkout tree few minutes ago and i'm getting this log when applying diff r620-2# cat arch/powerpc64/powerpc64/pmap.c.rej @@ -995,7 +995,7 @@ { int i; - pool_init(_pmap_pool, sizeof(struct pmap), 0, IPL_NONE, 0, + pool_init(_pmap_pool, sizeof(struct pmap), 0, IPL_MPFLOOR, 0, "pmap", _allocator_single); pool_setlowat(_pmap_pool, 2); pool_init(_vp_pool, sizeof(struct pmapvp1), 0, IPL_VM, 0, but i've complied kernel with your diff and i'm getting this panic acpimadt0 at acpi0 addr 0xfee0: PC-AT compat panic: uvm_fault(0xd0e56c00, 0xf552f000, 0, 1) -> e Stopped at db_enter+0x4: popl%ebp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND * 0 0 0 0x1 0x2000K swapper db_enter(d0e2fab5,d110aa54,0,f552f000,d0e798c8) at db_enter+0x4 panic(d0c3302c,d0e56c00,f552f000,1,e) at panic+0xd3 kpageflttrap(d110aac4,f552f438,f552f438,,0) at kpageflttrap+0x14d trap(d110aac4) at trap+0x26a calltrap(8,10806,d0a46644,d110ab28,fee0) at calltrap+0xc pmap_enter_special_pae(d0e1,fee0,3,10) at pmap_enter_special_pae+0x21e lapic_boot_init(fee0) at lapic_boot_init+0x43 acpimadt_attach(d6f08400,d6f34fc0,d110ac58) at acpimadt_attach+0x12c config_attach(d6f08400,d0e202e0,d110ac58,d05abc50) at config_attach+0x18a config_found_sm(d6f08400,d110ac58,d05abc50,d05ae190) at config_found_sm+0x29 acpi_attach_common(d6f08400,f0c40) at acpi_attach_common+0x556 acpi_attach(d6f120c0,d6f08400,d110adb8) at acpi_attach+0x2c config_attach(d6f120c0,d0e204a0,d110adb8,d068fe40) at config_attach+0x18a config_found_sm(d6f120c0,d110adb8,d068fe40,0) at config_found_sm+0x29 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs.
enc: remove unused start routine
enc(4) does not use the ifqueue API at all; IPsec packets are directly transformed in the IP input/output routines. enc_start() is never called (by design) so I'd like to remove it for clarity - similar to lo(4) does not have a start routine defined either. Tested with various iked(8) tunnels on sparc64 and amd64. Feedback? OK? Index: if_enc.c === RCS file: /cvs/src/sys/net/if_enc.c,v retrieving revision 1.77 diff -u -p -r1.77 if_enc.c --- if_enc.c10 Jul 2020 13:22:22 - 1.77 +++ if_enc.c23 Dec 2020 01:37:31 - @@ -44,7 +44,6 @@ void encattach(int); int enc_clone_create(struct if_clone *, int); int enc_clone_destroy(struct ifnet *); -voidenc_start(struct ifnet *); int enc_output(struct ifnet *, struct mbuf *, struct sockaddr *, struct rtentry *); int enc_ioctl(struct ifnet *, u_long, caddr_t); @@ -85,7 +84,6 @@ enc_clone_create(struct if_clone *ifc, i ifp->if_softc = sc; ifp->if_type = IFT_ENC; ifp->if_xflags = IFXF_CLONED; - ifp->if_start = enc_start; ifp->if_output = enc_output; ifp->if_ioctl = enc_ioctl; ifp->if_hdrlen = ENC_HDRLEN; @@ -157,19 +155,6 @@ enc_clone_destroy(struct ifnet *ifp) free(sc, M_DEVBUF, sizeof(*sc)); return (0); -} - -void -enc_start(struct ifnet *ifp) -{ - struct mbuf *m; - - for (;;) { - m = ifq_dequeue(>if_snd); - if (m == NULL) - break; - m_freem(m); - } } int
i386 pmap diff
Diff below switches the i386 pmap to use the modern km_alloc(9) functions and uses IPL_VM for the pmap pool, following the example of amd64. Don't have easy access to an i386 machine right now, so this has only been compile tested. ok (if it works)? Index: arch/i386/i386/pmap.c === RCS file: /cvs/src/sys/arch/i386/i386/pmap.c,v retrieving revision 1.209 diff -u -p -r1.209 pmap.c --- arch/i386/i386/pmap.c 24 Sep 2020 11:36:50 - 1.209 +++ arch/i386/i386/pmap.c 23 Dec 2020 17:17:29 - @@ -1065,7 +1065,7 @@ pmap_bootstrap(vaddr_t kva_start) * initialize the pmap pool. */ - pool_init(_pmap_pool, sizeof(struct pmap), 32, IPL_NONE, 0, + pool_init(_pmap_pool, sizeof(struct pmap), 32, IPL_VM, 0, "pmappl", NULL); pool_init(_pv_pool, sizeof(struct pv_entry), 0, IPL_VM, 0, "pvpl", _pv_page_allocator); @@ -1363,11 +1363,10 @@ void pmap_pinit_pd_86(struct pmap *pmap) { /* allocate PDP */ - pmap->pm_pdir = uvm_km_alloc(kernel_map, NBPG); + pmap->pm_pdir = (vaddr_t)km_alloc(NBPG, _any, _dirty, _waitok); if (pmap->pm_pdir == 0) panic("pmap_pinit_pd_86: kernel_map out of virtual space!"); - pmap_extract(pmap_kernel(), (vaddr_t)pmap->pm_pdir, - >pm_pdirpa); + pmap_extract(pmap_kernel(), (vaddr_t)pmap->pm_pdir, >pm_pdirpa); pmap->pm_pdirsize = NBPG; /* init PDP */ @@ -1395,7 +1394,8 @@ pmap_pinit_pd_86(struct pmap *pmap) * execution, one that lacks all kernel mappings. */ if (cpu_meltdown) { - pmap->pm_pdir_intel = uvm_km_zalloc(kernel_map, NBPG); + pmap->pm_pdir_intel = + (vaddr_t)km_alloc(NBPG, _any, _zero, _waitok); if (pmap->pm_pdir_intel == 0) panic("%s: kernel_map out of virtual space!", __func__); @@ -1447,11 +1447,12 @@ pmap_destroy(struct pmap *pmap) uvm_pagefree(pg); } - uvm_km_free(kernel_map, pmap->pm_pdir, pmap->pm_pdirsize); + km_free((void *)pmap->pm_pdir, pmap->pm_pdirsize, _any, _dirty); pmap->pm_pdir = 0; if (pmap->pm_pdir_intel) { - uvm_km_free(kernel_map, pmap->pm_pdir_intel, pmap->pm_pdirsize); + km_free((void *)pmap->pm_pdir_intel, pmap->pm_pdirsize, + _any, _zero); pmap->pm_pdir_intel = 0; } @@ -2520,8 +2521,9 @@ pmap_enter_special_86(vaddr_t va, paddr_ __func__, va); if (!pmap->pm_pdir_intel) { - if ((pmap->pm_pdir_intel = uvm_km_zalloc(kernel_map, NBPG)) - == 0) + pmap->pm_pdir_intel = + (vaddr_t)km_alloc(NBPG, _any, _zero, _waitok); + if (pmap->pm_pdir_intel == 0) panic("%s: kernel_map out of virtual space!", __func__); if (!pmap_extract(pmap, pmap->pm_pdir_intel, >pm_pdirpa_intel)) Index: arch/i386/i386/pmapae.c === RCS file: /cvs/src/sys/arch/i386/i386/pmapae.c,v retrieving revision 1.60 diff -u -p -r1.60 pmapae.c --- arch/i386/i386/pmapae.c 23 Sep 2020 15:13:26 - 1.60 +++ arch/i386/i386/pmapae.c 23 Dec 2020 17:17:29 - @@ -738,7 +738,7 @@ pmap_bootstrap_pae(void) (uint32_t)VM_PAGE_TO_PHYS(ptppg)); } } - uvm_km_free(kernel_map, (vaddr_t)pd, NBPG); + km_free(pd, NBPG, _any, _dirty); DPRINTF("%s: freeing PDP 0x%x\n", __func__, (uint32_t)pd); } @@ -944,7 +944,8 @@ pmap_pinit_pd_pae(struct pmap *pmap) paddr_t pdidx[4]; /* allocate PDP */ - pmap->pm_pdir = uvm_km_alloc(kernel_map, 4 * NBPG); + pmap->pm_pdir = (vaddr_t)km_alloc(4 * NBPG, _any, + _dirty, _waitok); if (pmap->pm_pdir == 0) panic("pmap_pinit_pd_pae: kernel_map out of virtual space!"); /* page index is in the pmap! */ @@ -997,7 +998,8 @@ pmap_pinit_pd_pae(struct pmap *pmap) if (cpu_meltdown) { int i; - if ((va = uvm_km_zalloc(kernel_map, 4 * NBPG)) == 0) + va = (vaddr_t)km_alloc(4 * NBPG, _any, _zero, _waitok); + if (va == 0) panic("%s: kernel_map out of virtual space!", __func__); if (!pmap_extract(pmap_kernel(), (vaddr_t)>pm_pdidx_intel, >pm_pdirpa_intel)) @@ -1936,7 +1938,8 @@ pmap_enter_special_pae(vaddr_t va, paddr __func__, va); if (!pmap->pm_pdir_intel) { - if ((vapd = uvm_km_zalloc(kernel_map, 4 * NBPG)) == 0) + vapd = (vaddr_t)km_alloc(4 * NBPG, _any, _zero, _waitok); + if (vapd == 0)
Re: pool(9): remove ticks (attempt 2)
On Fri, Dec 11, 2020 at 05:32:54PM -0600, Scott Cheloha wrote: > On Fri, Dec 11, 2020 at 07:52:45PM +0100, Mark Kettenis wrote: > > > Date: Fri, 11 Dec 2020 11:51:54 -0600 > > > From: Scott Cheloha > > > > > > On Fri, Dec 11, 2020 at 09:49:07AM -0300, Martin Pieuchot wrote: > > > > > > > > I'm not sure to understand, can't we do: > > > > > > > > pool_wait_free = SEC_TO_NSEC(1); > > > > pool_wait_gc = SEC_TO_NSEC(8); > > > > > > > [...] > > > > > > We can do that at runtime but not at compile time. SEC_TO_NSEC(1) > > > isn't a constant so that won't compile (I just tried). > > > > > > We _could_ do something like this: > > > > > > #define POOL_WAIT_FREESEC_TO_NSEC(1) > > > > > > I think the compiler will probably inline the result and elide the > > > overflow check because the input is a constant. I don't know how to > > > verify this, but my limited understanding of compilers suggests that > > > this is totally possible. > > > > Yes. The consequence of that is that the values are no longer > > patchable. That may not be very important though (I never really use > > that possibility). > > What do you mean by "patchable"? I assume you don't mean the source > code. > > (Also, you did not comment on the struct stuff below so I'm proceeding > with the impression there's nothing at issue there.) Hearing nothing after two weeks I assume nobody cares if timeouts are no longer patchable. Looking for OKs on the attached patch. CC tedu@/dlg@, who added these timeouts to pool(9) in the first place: https://github.com/openbsd/src/commit/786f6c84e747ccb9777ad35d2b01160679aec089 Index: sys/pool.h === RCS file: /cvs/src/sys/sys/pool.h,v retrieving revision 1.77 diff -u -p -r1.77 pool.h --- sys/pool.h 19 Jul 2019 09:03:03 - 1.77 +++ sys/pool.h 23 Dec 2020 17:06:19 - @@ -201,9 +201,9 @@ struct pool { u_int pr_cache_items; /* target list length */ u_int pr_cache_contention; u_int pr_cache_contention_prev; - int pr_cache_tick; /* time idle list was empty */ - int pr_cache_nout; + uint64_tpr_cache_timestamp; /* time idle list was empty */ uint64_tpr_cache_ngc; /* # of times the gc released a list */ + int pr_cache_nout; u_int pr_align; u_int pr_maxcolors; /* Cache coloring */ Index: kern/subr_pool.c === RCS file: /cvs/src/sys/kern/subr_pool.c,v retrieving revision 1.230 diff -u -p -r1.230 subr_pool.c --- kern/subr_pool.c24 Jan 2020 06:31:17 - 1.230 +++ kern/subr_pool.c23 Dec 2020 17:06:20 - @@ -41,6 +41,7 @@ #include #include #include +#include #include #include @@ -148,7 +149,7 @@ struct pool_page_header { caddr_t ph_page;/* this page's address */ caddr_t ph_colored; /* page's colored address */ unsigned long ph_magic; - int ph_tick; + uint64_tph_timestamp; }; #define POOL_MAGICBIT (1 << 3) /* keep away from perturbed low bits */ #define POOL_PHPOISON(ph) ISSET((ph)->ph_magic, POOL_MAGICBIT) @@ -266,8 +267,22 @@ void pool_gc_sched(void *); struct timeout pool_gc_tick = TIMEOUT_INITIALIZER(pool_gc_sched, NULL); void pool_gc_pages(void *); struct task pool_gc_task = TASK_INITIALIZER(pool_gc_pages, NULL); -int pool_wait_free = 1; -int pool_wait_gc = 8; + +#define POOL_WAIT_FREE SEC_TO_NSEC(1) +#define POOL_WAIT_GC SEC_TO_NSEC(8) + +/* + * TODO Move getnsecuptime() to kern_tc.c and document it when we + * have callers in other modules. + */ +static uint64_t +getnsecuptime(void) +{ + struct timespec now; + + getnanouptime(); + return TIMESPEC_TO_NSEC(); +} RBT_PROTOTYPE(phtree, pool_page_header, ph_node, phtree_compare); @@ -797,7 +812,7 @@ pool_put(struct pool *pp, void *v) /* is it time to free a page? */ if (pp->pr_nidle > pp->pr_maxpages && (ph = TAILQ_FIRST(>pr_emptypages)) != NULL && - (ticks - ph->ph_tick) > (hz * pool_wait_free)) { + getnsecuptime() - ph->ph_timestamp > POOL_WAIT_FREE) { freeph = ph; pool_p_remove(pp, freeph); } @@ -864,7 +879,7 @@ pool_do_put(struct pool *pp, void *v) */ pp->pr_nidle++; - ph->ph_tick = ticks; + ph->ph_timestamp = getnsecuptime(); TAILQ_REMOVE(>pr_partpages, ph, ph_entry); TAILQ_INSERT_TAIL(>pr_emptypages, ph, ph_entry); pool_update_curpage(pp); @@ -1566,7 +1581,7 @@ pool_gc_pages(void *null) /* is it time to free a page? */ if (pp->pr_nidle > pp->pr_minpages && (ph =
Re: xhci zero length transfers 'leak' one transfer buffer count
On Wed, 23 Dec 2020 17:35:36 +0100 Patrick Wildt wrote: > Am Wed, Dec 23, 2020 at 10:44:21AM +0100 schrieb Marcus Glocker: > > On Wed, 23 Dec 2020 09:47:44 +0100 > > Marcus Glocker wrote: > > > > > On Tue, 22 Dec 2020 20:55:41 +0100 > > > Marcus Glocker wrote: > > > > > > > > > Did you consider incrementing xx->ntrb instead? > > > > > > > > >That doesn't work either, because the status completion code > > > > >needs xx->ntrb to be correct for the data TD to be handled > > > > >correctly. Incrementing xx->ntrb means the number of TRBs for > > > > >the data TD is incorrect, since it includes the (optional) > > > > >zero TD's TRB. > > > > > > > > > >In this case the zero TD allocates a TRB but doesn't do proper > > > > >accounting, and currently there's no place where this could be > > > > >accounted properly. > > > > > > > > > >In the end it's all software, so I guess the diff will simply > > > > >have to be bigger than just a one-liner. > > > > > > > > > > With the diff below the produced TRB isn't accounted which > > > > > > might lead > > > > > > to and off-by-one. > > > > > > > > Sorry, I missed this thread and had to re-grab the last mail > > > > from MARC. > > > > > > > > Can't we just take account of the zero trb separately? > > > > > > We also want to reset the zerotrb. > > > > Re-thinking this again I think we should only increase the zerotrb > > to avoid again a possible miss match for free_trbs, and leave the > > responsibility to the caller of xhci_xfer_get_trb() to request the > > right amount of zerotrb. > > > > > > Index: dev/usb/xhci.c > > === > > RCS file: /cvs/src/sys/dev/usb/xhci.c,v > > retrieving revision 1.119 > > diff -u -p -u -p -r1.119 xhci.c > > --- dev/usb/xhci.c 31 Jul 2020 19:27:57 - 1.119 > > +++ dev/usb/xhci.c 23 Dec 2020 09:38:58 - > > @@ -1135,8 +1135,10 @@ xhci_xfer_done(struct usbd_xfer *xfer) > > i = (xp->ring.ntrb - 1); > > } > > xp->free_trbs += xx->ntrb; > > + xp->free_trbs += xx->zerotrb; > > xx->index = -1; > > xx->ntrb = 0; > > + xx->zerotrb = 0; > > > > timeout_del(>timeout_handle); > > usb_rem_task(xfer->device, >abort_task); > > @@ -1842,6 +1844,7 @@ xhci_xfer_get_trb(struct xhci_softc *sc, > > switch (last) { > > case -1:/* This will be a zero-length TD. */ > > xp->pending_xfers[xp->ring.index] = NULL; > > + xx->zerotrb += 1; > > break; > > case 0: /* This will be in a chain. */ > > xp->pending_xfers[xp->ring.index] = xfer; > > Index: dev/usb/xhcivar.h > > === > > RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v > > retrieving revision 1.11 > > diff -u -p -u -p -r1.11 xhcivar.h > > --- dev/usb/xhcivar.h 6 Oct 2019 17:30:00 - 1.11 > > +++ dev/usb/xhcivar.h 23 Dec 2020 09:38:58 - > > @@ -40,6 +40,7 @@ struct xhci_xfer { > > struct usbd_xfer xfer; > > int index; /* Index > > of the last TRB */ size_tntrb; > > /* Number of associated TRBs */ > > + size_t zerotrb; /* Is zero > > len TRB required? */ > > It's a zero-length TD, not TRB. I mean, it indeed is a zero-legth > TRB, but the important thing is that it's part of an extra TD. So at > least update the comment, maybe even the variable name. > > The difference is that a TD means that it's a separate transfer. It > also completes seperately from the TD before. In theory xfer done > will be called on the initial TD, not on the zero TD, which means > that we could have a race where our accounting "frees" the zero TD, > even though the controller isn't there yet. In practice I think this > is not an issue, the ring's hopefully long enough that we don't > immediately reuse the TRB that we just freed. > > So, I think the approach taken in this diff is fine, the code looks > good, only the naming I think can be improved. Maybe really just call > it zerotd, then it also fits with the comment. Right, makes sense. Should we call variable and comment like that so it's clear? Index: dev/usb/xhci.c === RCS file: /cvs/src/sys/dev/usb/xhci.c,v retrieving revision 1.119 diff -u -p -u -p -r1.119 xhci.c --- dev/usb/xhci.c 31 Jul 2020 19:27:57 - 1.119 +++ dev/usb/xhci.c 23 Dec 2020 17:10:40 - @@ -1135,8 +1135,10 @@ xhci_xfer_done(struct usbd_xfer *xfer) i = (xp->ring.ntrb - 1); } xp->free_trbs += xx->ntrb; + xp->free_trbs += xx->zerotd; xx->index = -1; xx->ntrb = 0; + xx->zerotd = 0; timeout_del(>timeout_handle); usb_rem_task(xfer->device, >abort_task); @@ -1842,6 +1844,7 @@ xhci_xfer_get_trb(struct xhci_softc *sc,
acme-client(1): fulfil all challenges, then tell the the CA
First fulfil all challenges then tell the CA that it should check. With a CSR with multiple SANs acme-client would write one challenge, tell the CA, write the next challenge and so on. For http-01 this doesn't matter but I think this will be nicer for dns-01 because there are propagation delays to consider. Please be extra careful checking this. If I mess this up people might run into renewal issues months from now. And when that happens people tend to comment... (Which I also pull this out of the big diff I'm currently working on for dns-01.) OK? diff --git netproc.c netproc.c index 38732a4dd01..7c502643acc 100644 --- netproc.c +++ netproc.c @@ -840,7 +840,12 @@ netproc(int kfd, int afd, int Cfd, int cfd, int dfd, int rfd, if (readop(Cfd, COMM_CHNG_ACK) != CHNG_ACK) goto out; - /* Write to the CA that it's ready. */ + } + /* Write to the CA that it's ready. */ + for (i = 0; i < order.authsz; i++) { + if (chngs[i].status == CHNG_VALID || + chngs[i].status == CHNG_INVALID) + continue; if (!dochngresp(, [i])) goto out; } -- I'm not entirely sure you are real.
Refactor klist insertion and removal
The following diff: * renames existing functions klist_insert() and klist_remove() to klist_insert_locked() and klist_remove_locked(), respectively. These functions expect that the caller has acquired the klist lock. * adds new functions klist_insert() and klist_remove() that acquire the klist lock internally. When the diff is applied, no calls of the new klist_insert() and klist_remove() should appear. Nothing related to locking is changed yet. Note that the kernel lock serves as the klist lock by default. The functions with internal locking will allow some code simplification when the caller does not need the klist lock for anything else. In many cases, the f_detach callback only does klist removal. This could be utilized to compact the code even further. OK? Index: arch/arm64/dev/apm.c === RCS file: src/sys/arch/arm64/dev/apm.c,v retrieving revision 1.5 diff -u -p -r1.5 apm.c --- arch/arm64/dev/apm.c29 May 2020 04:42:23 - 1.5 +++ arch/arm64/dev/apm.c23 Dec 2020 16:21:29 - @@ -270,7 +270,7 @@ filt_apmrdetach(struct knote *kn) { struct apm_softc *sc = (struct apm_softc *)kn->kn_hook; - klist_remove(>sc_note, kn); + klist_remove_locked(>sc_note, kn); } int @@ -302,7 +302,7 @@ apmkqfilter(dev_t dev, struct knote *kn) } kn->kn_hook = (caddr_t)sc; - klist_insert(>sc_note, kn); + klist_insert_locked(>sc_note, kn); return (0); } Index: arch/i386/i386/apm.c === RCS file: src/sys/arch/i386/i386/apm.c,v retrieving revision 1.125 diff -u -p -r1.125 apm.c --- arch/i386/i386/apm.c24 Jun 2020 22:03:40 - 1.125 +++ arch/i386/i386/apm.c23 Dec 2020 16:21:29 - @@ -1117,7 +1117,7 @@ filt_apmrdetach(struct knote *kn) struct apm_softc *sc = (struct apm_softc *)kn->kn_hook; rw_enter_write(>sc_lock); - klist_remove(>sc_note, kn); + klist_remove_locked(>sc_note, kn); rw_exit_write(>sc_lock); } @@ -1151,7 +1151,7 @@ apmkqfilter(dev_t dev, struct knote *kn) kn->kn_hook = (caddr_t)sc; rw_enter_write(>sc_lock); - klist_insert(>sc_note, kn); + klist_insert_locked(>sc_note, kn); rw_exit_write(>sc_lock); return (0); } Index: arch/loongson/dev/apm.c === RCS file: src/sys/arch/loongson/dev/apm.c,v retrieving revision 1.38 diff -u -p -r1.38 apm.c --- arch/loongson/dev/apm.c 24 Jun 2020 22:03:40 - 1.38 +++ arch/loongson/dev/apm.c 23 Dec 2020 16:21:29 - @@ -291,7 +291,7 @@ filt_apmrdetach(struct knote *kn) { struct apm_softc *sc = (struct apm_softc *)kn->kn_hook; - klist_remove(>sc_note, kn); + klist_remove_locked(>sc_note, kn); } int @@ -323,7 +323,7 @@ apmkqfilter(dev_t dev, struct knote *kn) } kn->kn_hook = (caddr_t)sc; - klist_insert(>sc_note, kn); + klist_insert_locked(>sc_note, kn); return (0); } Index: arch/macppc/dev/apm.c === RCS file: src/sys/arch/macppc/dev/apm.c,v retrieving revision 1.22 diff -u -p -r1.22 apm.c --- arch/macppc/dev/apm.c 7 Apr 2020 13:27:50 - 1.22 +++ arch/macppc/dev/apm.c 23 Dec 2020 16:21:29 - @@ -305,7 +305,7 @@ filt_apmrdetach(struct knote *kn) { struct apm_softc *sc = (struct apm_softc *)kn->kn_hook; - klist_remove(>sc_note, kn); + klist_remove_locked(>sc_note, kn); } int @@ -337,7 +337,7 @@ apmkqfilter(dev_t dev, struct knote *kn) } kn->kn_hook = (caddr_t)sc; - klist_insert(>sc_note, kn); + klist_insert_locked(>sc_note, kn); return (0); } Index: arch/sparc64/dev/vldcp.c === RCS file: src/sys/arch/sparc64/dev/vldcp.c,v retrieving revision 1.20 diff -u -p -r1.20 vldcp.c --- arch/sparc64/dev/vldcp.c23 May 2020 11:29:37 - 1.20 +++ arch/sparc64/dev/vldcp.c23 Dec 2020 16:21:29 - @@ -628,7 +628,7 @@ filt_vldcprdetach(struct knote *kn) int s; s = spltty(); - klist_remove(>sc_rsel.si_note, kn); + klist_remove_locked(>sc_rsel.si_note, kn); splx(s); } @@ -639,7 +639,7 @@ filt_vldcpwdetach(struct knote *kn) int s; s = spltty(); - klist_remove(>sc_wsel.si_note, kn); + klist_remove_locked(>sc_wsel.si_note, kn); splx(s); } @@ -733,7 +733,7 @@ vldcpkqfilter(dev_t dev, struct knote *k kn->kn_hook = sc; s = spltty(); - klist_insert(klist, kn); + klist_insert_locked(klist, kn); splx(s); return (0); Index: dev/audio.c === RCS file: src/sys/dev/audio.c,v retrieving revision 1.191 diff -u -p -r1.191
Re: xhci zero length transfers 'leak' one transfer buffer count
Am Wed, Dec 23, 2020 at 10:44:21AM +0100 schrieb Marcus Glocker: > On Wed, 23 Dec 2020 09:47:44 +0100 > Marcus Glocker wrote: > > > On Tue, 22 Dec 2020 20:55:41 +0100 > > Marcus Glocker wrote: > > > > > > > Did you consider incrementing xx->ntrb instead? > > > > > > >That doesn't work either, because the status completion code needs > > > >xx->ntrb to be correct for the data TD to be handled correctly. > > > >Incrementing xx->ntrb means the number of TRBs for the data TD is > > > >incorrect, since it includes the (optional) zero TD's TRB. > > > > > > > >In this case the zero TD allocates a TRB but doesn't do proper > > > >accounting, and currently there's no place where this could be > > > >accounted properly. > > > > > > > >In the end it's all software, so I guess the diff will simply have > > > >to be bigger than just a one-liner. > > > > > > > > With the diff below the produced TRB isn't accounted which might > > > > > lead > > > > > to and off-by-one. > > > > > > Sorry, I missed this thread and had to re-grab the last mail from > > > MARC. > > > > > > Can't we just take account of the zero trb separately? > > > > We also want to reset the zerotrb. > > Re-thinking this again I think we should only increase the zerotrb to > avoid again a possible miss match for free_trbs, and leave the > responsibility to the caller of xhci_xfer_get_trb() to request the > right amount of zerotrb. > > > Index: dev/usb/xhci.c > === > RCS file: /cvs/src/sys/dev/usb/xhci.c,v > retrieving revision 1.119 > diff -u -p -u -p -r1.119 xhci.c > --- dev/usb/xhci.c31 Jul 2020 19:27:57 - 1.119 > +++ dev/usb/xhci.c23 Dec 2020 09:38:58 - > @@ -1135,8 +1135,10 @@ xhci_xfer_done(struct usbd_xfer *xfer) > i = (xp->ring.ntrb - 1); > } > xp->free_trbs += xx->ntrb; > + xp->free_trbs += xx->zerotrb; > xx->index = -1; > xx->ntrb = 0; > + xx->zerotrb = 0; > > timeout_del(>timeout_handle); > usb_rem_task(xfer->device, >abort_task); > @@ -1842,6 +1844,7 @@ xhci_xfer_get_trb(struct xhci_softc *sc, > switch (last) { > case -1:/* This will be a zero-length TD. */ > xp->pending_xfers[xp->ring.index] = NULL; > + xx->zerotrb += 1; > break; > case 0: /* This will be in a chain. */ > xp->pending_xfers[xp->ring.index] = xfer; > Index: dev/usb/xhcivar.h > === > RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v > retrieving revision 1.11 > diff -u -p -u -p -r1.11 xhcivar.h > --- dev/usb/xhcivar.h 6 Oct 2019 17:30:00 - 1.11 > +++ dev/usb/xhcivar.h 23 Dec 2020 09:38:58 - > @@ -40,6 +40,7 @@ struct xhci_xfer { > struct usbd_xfer xfer; > int index; /* Index of the last TRB */ > size_t ntrb; /* Number of associated TRBs */ > + size_t zerotrb; /* Is zero len TRB required? */ It's a zero-length TD, not TRB. I mean, it indeed is a zero-legth TRB, but the important thing is that it's part of an extra TD. So at least update the comment, maybe even the variable name. The difference is that a TD means that it's a separate transfer. It also completes seperately from the TD before. In theory xfer done will be called on the initial TD, not on the zero TD, which means that we could have a race where our accounting "frees" the zero TD, even though the controller isn't there yet. In practice I think this is not an issue, the ring's hopefully long enough that we don't immediately reuse the TRB that we just freed. So, I think the approach taken in this diff is fine, the code looks good, only the naming I think can be improved. Maybe really just call it zerotd, then it also fits with the comment. > }; > > struct xhci_ring { >
Re: netstat - proto ip record
On Wed, Dec 23, 2020 at 04:13:04PM +0100, Alexander Bluhm wrote: > On Wed, Dec 16, 2020 at 05:24:50PM +0100, Claudio Jeker wrote: > > On Wed, Dec 16, 2020 at 03:54:04PM +, Stuart Henderson wrote: > > > On 2020/12/16 16:43, Salvatore Cuzzilla wrote: > > > > Hi folks, > > > > > > > > is there any process associated with this netstat record? > > > > btw, what's the meaning of the state field with value '17'? > > > > > > > > ToTo@obsd ~ $ doas netstat -an -f inet > > > > Active Internet connections (including servers) > > > > Proto Recv-Q Send-Q Local Address Foreign Address > > > > (state) > > > > ip 0 0 *.**.*17 > > > > > > Are kernel and userland in sync? > > > > This is a SOCK_RAW socket using protocol 17 (UDP). AFAIK this is dhclient. > > You can see this also with fstat. > > root dhclient 750245* internet dgram udp *:0 > > Should we print a specific headline in netstat to avoid such questions? > > Proto Recv-Q Send-Q Local Address Foreign AddressIP-Proto > ip 0 0 *.**.*17 > > Proto Recv-Q Send-Q Local Address Foreign AddressTCP-State > tcp 0 0 192.168.2.138.3513052.10.128.80.443 > ESTABLISHED > > Proto Recv-Q Send-Q Local Address Foreign Address > udp 0 0 192.168.2.138.31411162.159.200.1.123 > > ok? I like it. OK claudio@ > Index: inet.c > === > RCS file: /data/mirror/openbsd/cvs/src/usr.bin/netstat/inet.c,v > retrieving revision 1.168 > diff -u -p -r1.168 inet.c > --- inet.c15 Jan 2020 14:02:37 - 1.168 > +++ inet.c23 Dec 2020 15:08:45 - > @@ -327,9 +327,10 @@ netdomainpr(struct kinfo_file *kf, int p > if (Bflag && istcp) > printf("%-6.6s %-6.6s %-6.6s ", > "Recv-W", "Send-W", "Cgst-W"); > - printf(" %-*.*s %-*.*s %s\n", > + printf(" %-*.*s %-*.*s%s\n", > addrlen, addrlen, "Local Address", > - addrlen, addrlen, "Foreign Address", "(state)"); > + addrlen, addrlen, "Foreign Address", > + istcp ? " TCP-State" : type == SOCK_RAW ? " IP-Proto" : ""); > } > > if (Aflag) -- :wq Claudio
Re: netstat - proto ip record
On Wed, Dec 16, 2020 at 05:24:50PM +0100, Claudio Jeker wrote: > On Wed, Dec 16, 2020 at 03:54:04PM +, Stuart Henderson wrote: > > On 2020/12/16 16:43, Salvatore Cuzzilla wrote: > > > Hi folks, > > > > > > is there any process associated with this netstat record? > > > btw, what's the meaning of the state field with value '17'? > > > > > > ToTo@obsd ~ $ doas netstat -an -f inet > > > Active Internet connections (including servers) > > > Proto Recv-Q Send-Q Local Address Foreign Address > > > (state) > > > ip 0 0 *.**.*17 > > > > Are kernel and userland in sync? > > This is a SOCK_RAW socket using protocol 17 (UDP). AFAIK this is dhclient. > You can see this also with fstat. > root dhclient 750245* internet dgram udp *:0 Should we print a specific headline in netstat to avoid such questions? Proto Recv-Q Send-Q Local Address Foreign AddressIP-Proto ip 0 0 *.**.*17 Proto Recv-Q Send-Q Local Address Foreign AddressTCP-State tcp 0 0 192.168.2.138.3513052.10.128.80.443 ESTABLISHED Proto Recv-Q Send-Q Local Address Foreign Address udp 0 0 192.168.2.138.31411162.159.200.1.123 ok? bluhm Index: inet.c === RCS file: /data/mirror/openbsd/cvs/src/usr.bin/netstat/inet.c,v retrieving revision 1.168 diff -u -p -r1.168 inet.c --- inet.c 15 Jan 2020 14:02:37 - 1.168 +++ inet.c 23 Dec 2020 15:08:45 - @@ -327,9 +327,10 @@ netdomainpr(struct kinfo_file *kf, int p if (Bflag && istcp) printf("%-6.6s %-6.6s %-6.6s ", "Recv-W", "Send-W", "Cgst-W"); - printf(" %-*.*s %-*.*s %s\n", + printf(" %-*.*s %-*.*s%s\n", addrlen, addrlen, "Local Address", - addrlen, addrlen, "Foreign Address", "(state)"); + addrlen, addrlen, "Foreign Address", + istcp ? " TCP-State" : type == SOCK_RAW ? " IP-Proto" : ""); } if (Aflag)
Re: uvmexp & per-CPU counters
> Date: Wed, 23 Dec 2020 10:58:16 -0300 > From: Martin Pieuchot > > On 22/12/20(Tue) 23:43, Mark Kettenis wrote: > > > Date: Mon, 21 Dec 2020 16:46:32 -0300 > > > From: Martin Pieuchot > > > > > > During a page fault multiples counters are updated. They fall into two > > > categories "fault counters" and "global statistics" both of which are > > > currently represented by int-sized fields inside a global: `uvmexp'. > > > > > > Diff below makes use of the per-CPU counters_inc(9) API to make sure no > > > update is lost with an unlocked fault handler. I only converted the > > > fields touched by uvm_fault() to have a working solution and start a > > > discussion. > > > > > > - Should we keep a single enum for all fields inside `uvmexp' or do we > > > want to separate "statistics counters" which are mostly used sys/arch > > > from "fault counters" which are only used in uvm/uvm_fault.c? I don't know. Keep it simple for now, so maybe stick with what you have. > > > - The counter_add(9) API deals with uint64_t and currently uvmexp uses > > > int. Should we truncate or change the size of uvmexp fields or do > > > something else? Truncating should be good enough. > > > Comments? > > > > I think this breaks "show uvmexp" in ddb. > > Updated diff below fixes that. Any comment on the issues raised above? > > > You fear that using atomic operations for these counters would lead to > > too much bus contention on systems with a large number of CPUs? > > I don't know. I don't see the point of using atomic operations for > "real" counters that are not used to make any decision. Atomic > operations have a high cost, bus contention might be one of them. Sure. If dlg@ thinks this is an appropriate use of the per-CPU counters, this is ok with me. > Index: kern/init_main.c > === > RCS file: /cvs/src/sys/kern/init_main.c,v > retrieving revision 1.302 > diff -u -p -r1.302 init_main.c > --- kern/init_main.c 7 Dec 2020 16:55:28 - 1.302 > +++ kern/init_main.c 21 Dec 2020 19:37:13 - > @@ -432,6 +432,7 @@ main(void *framep) > #endif > > mbcpuinit();/* enable per cpu mbuf data */ > + uvm_init_percpu(); > > /* init exec and emul */ > init_exec(); > Index: uvm/uvm_extern.h > === > RCS file: /cvs/src/sys/uvm/uvm_extern.h,v > retrieving revision 1.155 > diff -u -p -r1.155 uvm_extern.h > --- uvm/uvm_extern.h 1 Dec 2020 13:56:22 - 1.155 > +++ uvm/uvm_extern.h 21 Dec 2020 19:37:13 - > @@ -289,6 +289,7 @@ void uvm_vsunlock_device(struct proc > * > void *); > void uvm_pause(void); > void uvm_init(void); > +void uvm_init_percpu(void); > int uvm_io(vm_map_t, struct uio *, int); > > #define UVM_IO_FIXPROT 0x01 > Index: uvm/uvm_fault.c > === > RCS file: /cvs/src/sys/uvm/uvm_fault.c,v > retrieving revision 1.109 > diff -u -p -r1.109 uvm_fault.c > --- uvm/uvm_fault.c 8 Dec 2020 12:26:31 - 1.109 > +++ uvm/uvm_fault.c 21 Dec 2020 19:37:13 - > @@ -35,6 +35,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -271,7 +272,7 @@ uvmfault_anonget(struct uvm_faultinfo *u > int result; > > result = 0; /* XXX shut up gcc */ > - uvmexp.fltanget++; > + counters_inc(uvmexp_counters, flt_anget); > /* bump rusage counters */ > if (anon->an_page) > curproc->p_ru.ru_minflt++; > @@ -295,7 +296,7 @@ uvmfault_anonget(struct uvm_faultinfo *u > if ((pg->pg_flags & (PG_BUSY|PG_RELEASED)) == 0) > return (VM_PAGER_OK); > atomic_setbits_int(>pg_flags, PG_WANTED); > - uvmexp.fltpgwait++; > + counters_inc(uvmexp_counters, flt_pgwait); > > /* >* the last unlock must be an atomic unlock+wait on > @@ -310,7 +311,7 @@ uvmfault_anonget(struct uvm_faultinfo *u > > if (pg == NULL) { /* out of RAM. */ > uvmfault_unlockall(ufi, amap, NULL); > - uvmexp.fltnoram++; > + counters_inc(uvmexp_counters, flt_noram); > uvm_wait("flt_noram1"); > /* ready to relock and try again */ > } else { > @@ -325,7 +326,7 @@ uvmfault_anonget(struct uvm_faultinfo *u >* it is ok to read an_swslot here because >* we hold PG_BUSY on the page. >*/ > - uvmexp.pageins++; > +
bgpd: adjust loopback filter for network statements
In bgpd statements like network inet static or network rtlabel "exportme" will skip routes that use 127.0.0.1 as nexthop. This makes sense for network connected and network static but for rtlabel and even priority based selection this makes less sense. Especially using rtlabel to export routes should give the admin also the option to export reject or blackhole routes (which have their nexthop set to 127.0.0.1). This diff does this change but still skips networks like 224/4 for network inet static. I think this is a decent compromise. -- :wq Claudio Index: kroute.c === RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v retrieving revision 1.239 diff -u -p -r1.239 kroute.c --- kroute.c1 Oct 2019 08:57:48 - 1.239 +++ kroute.c4 Dec 2020 11:31:09 - @@ -110,7 +110,7 @@ int kr6_delete(struct ktable *, struct k intkrVPN4_delete(struct ktable *, struct kroute_full *, u_int8_t); intkrVPN6_delete(struct ktable *, struct kroute_full *, u_int8_t); void kr_net_delete(struct network *); -intkr_net_match(struct ktable *, struct network_config *, u_int16_t); +intkr_net_match(struct ktable *, struct network_config *, u_int16_t, int); struct network *kr_net_find(struct ktable *, struct network *); void kr_net_clear(struct ktable *); void kr_redistribute(int, struct ktable *, struct kroute *); @@ -1318,7 +1318,8 @@ kr_net_redist_del(struct ktable *kt, str } int -kr_net_match(struct ktable *kt, struct network_config *net, u_int16_t flags) +kr_net_match(struct ktable *kt, struct network_config *net, u_int16_t flags, +int loopback) { struct network *xn; @@ -1330,10 +1331,16 @@ kr_net_match(struct ktable *kt, struct n /* static match already redistributed */ continue; case NETWORK_STATIC: + /* Skip networks with nexthop on loopback. */ + if (loopback) + continue; if (flags & F_STATIC) break; continue; case NETWORK_CONNECTED: + /* Skip networks with nexthop on loopback. */ + if (loopback) + continue; if (flags & F_CONNECTED) break; continue; @@ -1419,6 +1426,7 @@ kr_redistribute(int type, struct ktable { struct network_confignet; u_int32_ta; + int loflag = 0; bzero(, sizeof(net)); net.prefix.aid = AID_INET; @@ -1449,9 +1457,9 @@ kr_redistribute(int type, struct ktable (a >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET) return; - /* Consider networks with nexthop loopback as not redistributable. */ + /* Check if the nexthop is the loopback addr. */ if (kr->nexthop.s_addr == htonl(INADDR_LOOPBACK)) - return; + loflag = 1; /* * never allow 0.0.0.0/0 the default route can only be redistributed @@ -1460,7 +1468,7 @@ kr_redistribute(int type, struct ktable if (kr->prefix.s_addr == INADDR_ANY && kr->prefixlen == 0) return; - if (kr_net_match(kt, , kr->flags) == 0) + if (kr_net_match(kt, , kr->flags, loflag) == 0) /* no longer matches, if still present remove it */ kr_net_redist_del(kt, , 1); } @@ -1468,7 +1476,8 @@ kr_redistribute(int type, struct ktable void kr_redistribute6(int type, struct ktable *kt, struct kroute6 *kr6) { - struct network_confignet; + struct network_config net; + int loflag = 0; bzero(, sizeof(net)); net.prefix.aid = AID_INET6; @@ -1503,11 +1512,9 @@ kr_redistribute6(int type, struct ktable IN6_IS_ADDR_V4COMPAT(>prefix)) return; - /* -* Consider networks with nexthop loopback as not redistributable. -*/ + /* Check if the nexthop is the loopback addr. */ if (IN6_IS_ADDR_LOOPBACK(>nexthop)) - return; + loflag = 1; /* * never allow ::/0 the default route can only be redistributed @@ -1517,7 +1524,7 @@ kr_redistribute6(int type, struct ktable memcmp(>prefix, _any, sizeof(struct in6_addr)) == 0) return; - if (kr_net_match(kt, , kr6->flags) == 0) + if (kr_net_match(kt, , kr6->flags, loflag) == 0) /* no longer matches, if still present remove it */ kr_net_redist_del(kt, , 1); }
Re: uvmexp & per-CPU counters
On 22/12/20(Tue) 23:43, Mark Kettenis wrote: > > Date: Mon, 21 Dec 2020 16:46:32 -0300 > > From: Martin Pieuchot > > > > During a page fault multiples counters are updated. They fall into two > > categories "fault counters" and "global statistics" both of which are > > currently represented by int-sized fields inside a global: `uvmexp'. > > > > Diff below makes use of the per-CPU counters_inc(9) API to make sure no > > update is lost with an unlocked fault handler. I only converted the > > fields touched by uvm_fault() to have a working solution and start a > > discussion. > > > > - Should we keep a single enum for all fields inside `uvmexp' or do we > > want to separate "statistics counters" which are mostly used sys/arch > > from "fault counters" which are only used in uvm/uvm_fault.c? > > > > - The counter_add(9) API deals with uint64_t and currently uvmexp uses > > int. Should we truncate or change the size of uvmexp fields or do > > something else? > > > > Comments? > > I think this breaks "show uvmexp" in ddb. Updated diff below fixes that. Any comment on the issues raised above? > You fear that using atomic operations for these counters would lead to > too much bus contention on systems with a large number of CPUs? I don't know. I don't see the point of using atomic operations for "real" counters that are not used to make any decision. Atomic operations have a high cost, bus contention might be one of them. Index: kern/init_main.c === RCS file: /cvs/src/sys/kern/init_main.c,v retrieving revision 1.302 diff -u -p -r1.302 init_main.c --- kern/init_main.c7 Dec 2020 16:55:28 - 1.302 +++ kern/init_main.c21 Dec 2020 19:37:13 - @@ -432,6 +432,7 @@ main(void *framep) #endif mbcpuinit();/* enable per cpu mbuf data */ + uvm_init_percpu(); /* init exec and emul */ init_exec(); Index: uvm/uvm_extern.h === RCS file: /cvs/src/sys/uvm/uvm_extern.h,v retrieving revision 1.155 diff -u -p -r1.155 uvm_extern.h --- uvm/uvm_extern.h1 Dec 2020 13:56:22 - 1.155 +++ uvm/uvm_extern.h21 Dec 2020 19:37:13 - @@ -289,6 +289,7 @@ voiduvm_vsunlock_device(struct proc * void *); void uvm_pause(void); void uvm_init(void); +void uvm_init_percpu(void); intuvm_io(vm_map_t, struct uio *, int); #defineUVM_IO_FIXPROT 0x01 Index: uvm/uvm_fault.c === RCS file: /cvs/src/sys/uvm/uvm_fault.c,v retrieving revision 1.109 diff -u -p -r1.109 uvm_fault.c --- uvm/uvm_fault.c 8 Dec 2020 12:26:31 - 1.109 +++ uvm/uvm_fault.c 21 Dec 2020 19:37:13 - @@ -35,6 +35,7 @@ #include #include #include +#include #include #include #include @@ -271,7 +272,7 @@ uvmfault_anonget(struct uvm_faultinfo *u int result; result = 0; /* XXX shut up gcc */ - uvmexp.fltanget++; + counters_inc(uvmexp_counters, flt_anget); /* bump rusage counters */ if (anon->an_page) curproc->p_ru.ru_minflt++; @@ -295,7 +296,7 @@ uvmfault_anonget(struct uvm_faultinfo *u if ((pg->pg_flags & (PG_BUSY|PG_RELEASED)) == 0) return (VM_PAGER_OK); atomic_setbits_int(>pg_flags, PG_WANTED); - uvmexp.fltpgwait++; + counters_inc(uvmexp_counters, flt_pgwait); /* * the last unlock must be an atomic unlock+wait on @@ -310,7 +311,7 @@ uvmfault_anonget(struct uvm_faultinfo *u if (pg == NULL) { /* out of RAM. */ uvmfault_unlockall(ufi, amap, NULL); - uvmexp.fltnoram++; + counters_inc(uvmexp_counters, flt_noram); uvm_wait("flt_noram1"); /* ready to relock and try again */ } else { @@ -325,7 +326,7 @@ uvmfault_anonget(struct uvm_faultinfo *u * it is ok to read an_swslot here because * we hold PG_BUSY on the page. */ - uvmexp.pageins++; + counters_inc(uvmexp_counters, pageins); result = uvm_swap_get(pg, anon->an_swslot, PGO_SYNCIO); @@ -369,7 +370,7 @@ uvmfault_anonget(struct uvm_faultinfo *u uvm_anfree(anon); /* frees page for us */ if (locked)
Re: kqueue_scan() should not return EWOULDBLOCK
On 23/12/20(Wed) 07:18, Visa Hankala wrote: > This fixes a recent regression in kqueue_scan() where the function can > mistakenly return EWOULDBLOCK. > > Currently, kqueue_scan() does one more scan attempt after a timeout. > Usually, this gives no new events and the function bails out through > the following code. Note that it clears `error'. > > if (kq->kq_count == 0) { > /* > * Successive loops are only necessary if there are more > * ready events to gather, so they don't need to block. > */ > if ((tsp != NULL && !timespecisset(tsp)) || > scan->kqs_nevent != 0) { > splx(s); > error = 0; > goto done; > } > > However, there can be a last-minute event activation, in which case the > function processes the event queue. Unfortunately, the error variable > preserves its value EWOULDBLOCK/EAGAIN that gets returned to the caller. > kevent(2), or select(2) or poll(2), is not supposed to return this error. > > The issue emerged in r1.146 of kern_event.c when the final copyout() was > moved outside kqueue_scan(). The copyout()'s return value used to > override the EWOULDBLOCK. > > The following patch fixes the regression by clearing `error' at the > start of each scan round. The clearing could be done conditionally after > kqueue_sleep(). However, that does not seem as robust. The value could be cleaned before "goto retry", that would be more robust, no? Ok mpi@ either way. > Index: kern/kern_event.c > === > RCS file: src/sys/kern/kern_event.c,v > retrieving revision 1.153 > diff -u -p -r1.153 kern_event.c > --- kern/kern_event.c 20 Dec 2020 12:54:05 - 1.153 > +++ kern/kern_event.c 23 Dec 2020 07:10:24 - > @@ -977,6 +977,8 @@ kqueue_scan(struct kqueue_scan_state *sc > retry: > KASSERT(nkev == 0); > > + error = 0; > + > if (kq->kq_state & KQ_DYING) { > error = EBADF; > goto done; >
Re: diff: tcp ack improvement
On 2020/12/17 20:50, Jan Klemkow wrote: > ping > > On Fri, Nov 06, 2020 at 01:10:52AM +0100, Jan Klemkow wrote: > > Hi, > > > > bluhm and I make some network performance measurements and kernel > > profiling. I've been running this on my workstation since you sent it out - lots of long-running ssh connections, hourly reposync, daily rsync of base snapshots. I don't know enough about TCP stack behaviour to really give a meaningful OK, but certainly not seeing any problems with it. > > Setup: Linux (iperf) -10gbit-> OpenBSD (relayd) -10gbit-> Linux (iperf) > > > > We figured out, that the kernel uses a huge amount of processing time > > for sending ACKs to the sender on the receiving interface. After > > receiving a data segment, we send our two ACK. The first one in > > tcp_input() direct after receiving. The second ACK is send out, after > > the userland or the sosplice task read some data out of the socket > > buffer. > > > > The fist ACK in tcp_input() is called after receiving every other data > > segment like it is discribed in RFC1122: > > > > 4.2.3.2 When to Send an ACK Segment > > A TCP SHOULD implement a delayed ACK, but an ACK should > > not be excessively delayed; in particular, the delay > > MUST be less than 0.5 seconds, and in a stream of > > full-sized segments there SHOULD be an ACK for at least > > every second segment. > > > > This advice is based on the paper "Congestion Avoidance and Control": > > > > 4 THE GATEWAY SIDE OF CONGESTION CONTROL > > The 8 KBps senders were talking to 4.3+BSD receivers > > which would delay an ack for atmost one packet (because > > of an ack’s clock’ role, the authors believe that the > > minimum ack frequency should be every other packet). > > > > Sending the first ACK (on every other packet) coasts us too much > > processing time. Thus, we run into a full socket buffer earlier. The > > first ACK just acknowledges the received data, but does not update the > > window. The second ACK, caused by the socket buffer reader, also > > acknowledges the data and also updates the window. So, the second ACK, > > is much more worth for a fast packet processing than the fist one. > > > > The performance improvement is between 33% with splicing and 20% without > > splice: > > > > splicingrelaying > > > > current 3.1 GBit/s 2.6 GBit/s > > w/o first ack 4.1 GBit/s 3.1 GBit/s > > > > As far as I understand the implementation of other operating systems: > > Linux has implement a custom TCP_QUICKACK socket option, to turn this > > kind of feature on and off. FreeBSD and NetBSD sill depend on it, when > > using the New Reno implementation. > > > > The following diff turns off the direct ACK on every other segment. We > > are running this diff in production on our own machines at genua and on > > our products for several month, now. We don't noticed any problems, > > even with interactive network sessions (ssh) nor with bulk traffic. > > > > Another solution could be a sysctl(3) or an additional socket option, > > similar to Linux, to control this behavior per socket or system wide. > > Also, a counter to ACK every 3rd, 4th... data segment could beat the > > problem. > > > > bye, > > Jan > > > > Index: netinet/tcp_input.c > > === > > RCS file: /cvs/src/sys/netinet/tcp_input.c,v > > retrieving revision 1.365 > > diff -u -p -r1.365 tcp_input.c > > --- netinet/tcp_input.c 19 Jun 2020 22:47:22 - 1.365 > > +++ netinet/tcp_input.c 5 Nov 2020 23:00:34 - > > @@ -165,8 +165,8 @@ do { \ > > #endif > > > > /* > > - * Macro to compute ACK transmission behavior. Delay the ACK unless > > - * we have already delayed an ACK (must send an ACK every two segments). > > + * Macro to compute ACK transmission behavior. Delay the ACK until > > + * a read from the socket buffer or the delayed ACK timer causes one. > > * We also ACK immediately if we received a PUSH and the ACK-on-PUSH > > * option is enabled or when the packet is coming from a loopback > > * interface. > > @@ -176,8 +176,7 @@ do { \ > > struct ifnet *ifp = NULL; \ > > if (m && (m->m_flags & M_PKTHDR)) \ > > ifp = if_get(m->m_pkthdr.ph_ifidx); \ > > - if (TCP_TIMER_ISARMED(tp, TCPT_DELACK) || \ > > - (tcp_ack_on_push && (tiflags) & TH_PUSH) || \ > > + if ((tcp_ack_on_push && (tiflags) & TH_PUSH) || \ > > (ifp && (ifp->if_flags & IFF_LOOPBACK))) \ > > tp->t_flags |= TF_ACKNOW; \ > > else \ > > >
[diff] src/usr.sbin/smtpd: plug a memory leak in regex lookups
Hello, The following diff plugs a memory leak in regex lookups. Cheers, diff --git a/usr.sbin/smtpd/table.c b/usr.sbin/smtpd/table.c index 4691..d1578403 100644 --- a/usr.sbin/smtpd/table.c +++ b/usr.sbin/smtpd/table.c @@ -470,6 +470,7 @@ table_regex_match(const char *string, const char *pattern) { regex_t preg; int cflags = REG_EXTENDED|REG_NOSUB; + int ret; if (strncmp(pattern, "(?i)", 4) == 0) { cflags |= REG_ICASE; @@ -479,7 +480,11 @@ table_regex_match(const char *string, const char *pattern) if (regcomp(, pattern, cflags) != 0) return (0); - if (regexec(, string, 0, NULL, 0) != 0) + ret = regexec(, string, 0, NULL, 0); + + regfree(); + + if (ret != 0) return (0); return (1);
Re: Wake on LAN support for rge(4)
On Wed, Dec 23, 2020 at 12:35:46PM +0800, Kevin Lo wrote: > Hi, > > This diff implements WoL support in rge(4). I can wakeup the machine with WoL > after suspending it through `zzz` or powering off it through `halt -p`. Thanks! This works as expected in my testing. -Otto > > Index: share/man/man4/rge.4 > === > RCS file: /cvs/src/share/man/man4/rge.4,v > retrieving revision 1.4 > diff -u -p -u -p -r1.4 rge.4 > --- share/man/man4/rge.4 12 Oct 2020 02:11:10 - 1.4 > +++ share/man/man4/rge.4 23 Dec 2020 04:33:26 - > @@ -37,6 +37,15 @@ Rivet Networks Killer E3000 Adapter (250 > .It > TP-LINK TL-NG421 Adapter (2500baseT) > .El > +.Pp > +The > +.Nm > +driver additionally supports Wake on LAN (WoL). > +See > +.Xr arp 8 > +and > +.Xr ifconfig 8 > +for more details. > .Sh SEE ALSO > .Xr arp 4 , > .Xr ifmedia 4 , > Index: sys/dev/pci/if_rge.c > === > RCS file: /cvs/src/sys/dev/pci/if_rge.c,v > retrieving revision 1.9 > diff -u -p -u -p -r1.9 if_rge.c > --- sys/dev/pci/if_rge.c 12 Dec 2020 11:48:53 - 1.9 > +++ sys/dev/pci/if_rge.c 23 Dec 2020 04:33:27 - > @@ -59,6 +59,7 @@ int rge_debug = 0; > > int rge_match(struct device *, void *, void *); > void rge_attach(struct device *, struct device *, void *); > +int rge_activate(struct device *, int); > int rge_intr(void *); > int rge_encap(struct rge_softc *, struct mbuf *, int); > int rge_ioctl(struct ifnet *, u_long, caddr_t); > @@ -111,6 +112,10 @@ int rge_get_link_status(struct rge_soft > void rge_txstart(void *); > void rge_tick(void *); > void rge_link_state(struct rge_softc *); > +#ifndef SMALL_KERNEL > +int rge_wol(struct ifnet *, int); > +void rge_wol_power(struct rge_softc *); > +#endif > > static const struct { > uint16_t reg; > @@ -126,7 +131,7 @@ static const struct { > }; > > struct cfattach rge_ca = { > - sizeof(struct rge_softc), rge_match, rge_attach > + sizeof(struct rge_softc), rge_match, rge_attach, NULL, rge_activate > }; > > struct cfdriver rge_cd = { > @@ -272,6 +277,11 @@ rge_attach(struct device *parent, struct > ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING; > #endif > > +#ifndef SMALL_KERNEL > + ifp->if_capabilities |= IFCAP_WOL; > + ifp->if_wol = rge_wol; > + rge_wol(ifp, 0); > +#endif > timeout_set(>sc_timeout, rge_tick, sc); > task_set(>sc_task, rge_txstart, sc); > > @@ -288,6 +298,25 @@ rge_attach(struct device *parent, struct > } > > int > +rge_activate(struct device *self, int act) > +{ > + struct rge_softc *sc = (struct rge_softc *)self; > + int rv = 0; > + > + switch (act) { > + case DVACT_POWERDOWN: > + rv = config_activate_children(self, act); > +#ifndef SMALL_KERNEL > + rge_wol_power(sc); > +#endif > + default: > + rv = config_activate_children(self, act); > + break; > + } > + return (rv); > +} > + > +int > rge_intr(void *arg) > { > struct rge_softc *sc = arg; > @@ -2025,6 +2054,7 @@ rge_hw_init(struct rge_softc *sc) > /* Set PCIe uncorrectable error status. */ > rge_write_csi(sc, 0x108, > rge_read_csi(sc, 0x108) | 0x0010); > + > } > > void > @@ -2391,3 +2421,48 @@ rge_link_state(struct rge_softc *sc) > if_link_state_change(ifp); > } > } > + > +#ifndef SMALL_KERNEL > +int > +rge_wol(struct ifnet *ifp, int enable) > +{ > + struct rge_softc *sc = ifp->if_softc; > + > + if (enable) { > + if (!(RGE_READ_1(sc, RGE_CFG1) & RGE_CFG1_PM_EN)) { > + printf("%s: power management is disabled, " > + "cannot do WOL\n", sc->sc_dev.dv_xname); > + return (ENOTSUP); > + } > + > + } > + > + rge_iff(sc); > + > + if (enable) > + RGE_MAC_SETBIT(sc, 0xc0b6, 0x0001); > + else > + RGE_MAC_CLRBIT(sc, 0xc0b6, 0x0001); > + > + RGE_SETBIT_1(sc, RGE_EECMD, RGE_EECMD_WRITECFG); > + RGE_CLRBIT_1(sc, RGE_CFG5, RGE_CFG5_WOL_LANWAKE | RGE_CFG5_WOL_UCAST | > + RGE_CFG5_WOL_MCAST | RGE_CFG5_WOL_BCAST); > + RGE_CLRBIT_1(sc, RGE_CFG3, RGE_CFG3_WOL_LINK | RGE_CFG3_WOL_MAGIC); > + if (enable) > + RGE_SETBIT_1(sc, RGE_CFG5, RGE_CFG5_WOL_LANWAKE); > + RGE_CLRBIT_1(sc, RGE_EECMD, RGE_EECMD_WRITECFG); > + > + return (0); > +} > + > +void > +rge_wol_power(struct rge_softc *sc) > +{ > + /* Disable RXDV gate. */ > + RGE_CLRBIT_1(sc, RGE_PPSW, 0x08); > + DELAY(2000); > + > + RGE_SETBIT_1(sc, RGE_CFG1, RGE_CFG1_PM_EN); > + RGE_SETBIT_1(sc, RGE_CFG2, RGE_CFG2_PMSTS_EN); > +} > +#endif > Index: sys/dev/pci/if_rgereg.h > === >
Re: xhci zero length transfers 'leak' one transfer buffer count
On Wed, 23 Dec 2020 09:47:44 +0100 Marcus Glocker wrote: > On Tue, 22 Dec 2020 20:55:41 +0100 > Marcus Glocker wrote: > > > > > Did you consider incrementing xx->ntrb instead? > > > > >That doesn't work either, because the status completion code needs > > >xx->ntrb to be correct for the data TD to be handled correctly. > > >Incrementing xx->ntrb means the number of TRBs for the data TD is > > >incorrect, since it includes the (optional) zero TD's TRB. > > > > > >In this case the zero TD allocates a TRB but doesn't do proper > > >accounting, and currently there's no place where this could be > > >accounted properly. > > > > > >In the end it's all software, so I guess the diff will simply have > > >to be bigger than just a one-liner. > > > > > > With the diff below the produced TRB isn't accounted which might > > > > lead > > > > to and off-by-one. > > > > Sorry, I missed this thread and had to re-grab the last mail from > > MARC. > > > > Can't we just take account of the zero trb separately? > > We also want to reset the zerotrb. Re-thinking this again I think we should only increase the zerotrb to avoid again a possible miss match for free_trbs, and leave the responsibility to the caller of xhci_xfer_get_trb() to request the right amount of zerotrb. Index: dev/usb/xhci.c === RCS file: /cvs/src/sys/dev/usb/xhci.c,v retrieving revision 1.119 diff -u -p -u -p -r1.119 xhci.c --- dev/usb/xhci.c 31 Jul 2020 19:27:57 - 1.119 +++ dev/usb/xhci.c 23 Dec 2020 09:38:58 - @@ -1135,8 +1135,10 @@ xhci_xfer_done(struct usbd_xfer *xfer) i = (xp->ring.ntrb - 1); } xp->free_trbs += xx->ntrb; + xp->free_trbs += xx->zerotrb; xx->index = -1; xx->ntrb = 0; + xx->zerotrb = 0; timeout_del(>timeout_handle); usb_rem_task(xfer->device, >abort_task); @@ -1842,6 +1844,7 @@ xhci_xfer_get_trb(struct xhci_softc *sc, switch (last) { case -1:/* This will be a zero-length TD. */ xp->pending_xfers[xp->ring.index] = NULL; + xx->zerotrb += 1; break; case 0: /* This will be in a chain. */ xp->pending_xfers[xp->ring.index] = xfer; Index: dev/usb/xhcivar.h === RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v retrieving revision 1.11 diff -u -p -u -p -r1.11 xhcivar.h --- dev/usb/xhcivar.h 6 Oct 2019 17:30:00 - 1.11 +++ dev/usb/xhcivar.h 23 Dec 2020 09:38:58 - @@ -40,6 +40,7 @@ struct xhci_xfer { struct usbd_xfer xfer; int index; /* Index of the last TRB */ size_t ntrb; /* Number of associated TRBs */ + size_t zerotrb; /* Is zero len TRB required? */ }; struct xhci_ring {
Re: xhci zero length transfers 'leak' one transfer buffer count
On Tue, 22 Dec 2020 20:55:41 +0100 Marcus Glocker wrote: > > > Did you consider incrementing xx->ntrb instead? > > >That doesn't work either, because the status completion code needs > >xx->ntrb to be correct for the data TD to be handled correctly. > >Incrementing xx->ntrb means the number of TRBs for the data TD is > >incorrect, since it includes the (optional) zero TD's TRB. > > > >In this case the zero TD allocates a TRB but doesn't do proper > >accounting, and currently there's no place where this could be > >accounted properly. > > > >In the end it's all software, so I guess the diff will simply have > >to be bigger than just a one-liner. > > > > With the diff below the produced TRB isn't accounted which might > > > lead > > > to and off-by-one. > > Sorry, I missed this thread and had to re-grab the last mail from > MARC. > > Can't we just take account of the zero trb separately? We also want to reset the zerotrb. Index: dev/usb/xhci.c === RCS file: /cvs/src/sys/dev/usb/xhci.c,v retrieving revision 1.119 diff -u -p -u -p -r1.119 xhci.c --- dev/usb/xhci.c 31 Jul 2020 19:27:57 - 1.119 +++ dev/usb/xhci.c 23 Dec 2020 08:45:40 - @@ -1135,8 +1135,10 @@ xhci_xfer_done(struct usbd_xfer *xfer) i = (xp->ring.ntrb - 1); } xp->free_trbs += xx->ntrb; + xp->free_trbs += xx->zerotrb; xx->index = -1; xx->ntrb = 0; + xx->zerotrb = 0; timeout_del(>timeout_handle); usb_rem_task(xfer->device, >abort_task); @@ -1842,6 +1844,7 @@ xhci_xfer_get_trb(struct xhci_softc *sc, switch (last) { case -1:/* This will be a zero-length TD. */ xp->pending_xfers[xp->ring.index] = NULL; + xx->zerotrb = 1; /* There can only be one zero TRB per xfer. */ break; case 0: /* This will be in a chain. */ xp->pending_xfers[xp->ring.index] = xfer; Index: dev/usb/xhcivar.h === RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v retrieving revision 1.11 diff -u -p -u -p -r1.11 xhcivar.h --- dev/usb/xhcivar.h 6 Oct 2019 17:30:00 - 1.11 +++ dev/usb/xhcivar.h 23 Dec 2020 08:45:40 - @@ -40,6 +40,7 @@ struct xhci_xfer { struct usbd_xfer xfer; int index; /* Index of the last TRB */ size_t ntrb; /* Number of associated TRBs */ + size_t zerotrb; /* Is zero len TRB required? */ }; struct xhci_ring {
Re: [diff] src/usr.sbin/smtpd: plug a memory leak in regex lookups
Committed, thanks. On Wed, 2020-12-23 at 08:54 +0100, Gilles CHEHADE wrote: > Hello, > > The following diff plugs a memory leak in regex lookups. > > Cheers, > > > diff --git a/usr.sbin/smtpd/table.c b/usr.sbin/smtpd/table.c > index 4691..d1578403 100644 > --- a/usr.sbin/smtpd/table.c > +++ b/usr.sbin/smtpd/table.c > @@ -470,6 +470,7 @@ table_regex_match(const char *string, const char *pattern) > { > regex_t preg; > int cflags = REG_EXTENDED|REG_NOSUB; > + int ret; > > if (strncmp(pattern, "(?i)", 4) == 0) { > cflags |= REG_ICASE; > @@ -479,7 +480,11 @@ table_regex_match(const char *string, const char > *pattern) > if (regcomp(, pattern, cflags) != 0) > return (0); > > - if (regexec(, string, 0, NULL, 0) != 0) > + ret = regexec(, string, 0, NULL, 0); > + > + regfree(); > + > + if (ret != 0) > return (0); > > return (1); >