Re: malloc: change chunk sizes to be multiple of 16 instead of power of 2

2023-02-28 Thread Theo Buehler
On Wed, Mar 01, 2023 at 08:39:08AM +0100, Otto Moerbeek wrote:
> On Wed, Mar 01, 2023 at 08:31:47AM +0100, Theo Buehler wrote:
> 
> > On Tue, Feb 28, 2023 at 05:52:28PM +0100, Otto Moerbeek wrote:
> > > Second iteration.
> > > 
> > > Gain back performance by allocation chunk_info pages in a bundle, and
> > > use less buckets is !malloc option S. The chunk sizes used are 16, 32,
> > > 48, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320, 384, 448, 512, 640,
> > > 768, 896, 1024, 1280, 1536, 1792, 2048 (and a few more for sparc84
> > > with it's 8k sized pages and loongson with it's 16k pages).
> > > 
> > > If malloc option S (or rather cache size 0) is used we use strict
> > > multiple of 16 buckets, to get as many buckets as possible.
> > > 
> > > See the find_bucket() and bin_of() functions. Thanks to Tony Finch for
> > > pointing me to code to compute nice bucket sizes.
> > > 
> > > I think this is ready for review and wide testing.
> > 
> > Two vala-based ports, graphics/birdfont and productivity/minder, run out
> > of memory when attempting to build them with this diff (and its previous
> > version) on both amd64 and arm64:
> > 
> > ***MEMORY-ERROR***: valac[93681]: GSlice: failed to allocate 2032 bytes 
> > (alignment: 2048): Cannot allocate memory
> 
> Thanks, this smells like a bug in the aligned mem case.
> 
> + pof2 = 1 << MALLOC_MINSIZE;
> 
> should be 
>   
> + pof2 = MALLOC_MINSIZE;
> 
> By the looks of it. I'll get back to this.

I can confirm that changing this fixes this issue with both ports on
amd64 and arm64.

> 
>   -Otto
> 
> > 
> > Abort trap (core dumped)
> > 
> > To be able to build birdfont with PORTS_PRIVSEP = Yes, I had to bump
> > _pbuild's datasize-cur to 15G, while 14G was not enough. That's nearly
> > double the current default. On amd64 without this diff, birdfont builds
> > comfortably with a datasize-cur of 1G.
> > 
> > birdfont may be easier to investigate since the error happens early in
> > the build. You can get there relatively quickly by doing
> > 
> > cd /usr/ports/graphics/birdfont
> > doas pkg_add birdfont
> > make FETCH_PACKAGES= prepare
> > make
> > 
> > Not sure if the top of the trace is of much use. Here it is:
> > 
> > #0  thrkill () at /tmp/-:3
> > #1  0x486dd8c0aacac468 in ?? ()
> > #2  0x0c3a34319d0e in _libc_abort () at 
> > /usr/src/lib/libc/stdlib/abort.c:51
> > #3  0x0c39b735724b in mem_error () from 
> > /usr/local/lib/libglib-2.0.so.4201.9
> > #4  0x0c39b735604f in slab_allocator_alloc_chunk () from 
> > /usr/local/lib/libglib-2.0.so.4201.9
> > #5  0x0c39b7355a95 in g_slice_alloc () from 
> > /usr/local/lib/libglib-2.0.so.4201.9
> > #6  0x0c39b735606e in g_slice_alloc0 () from 
> > /usr/local/lib/libglib-2.0.so.4201.9
> > #7  0x0c396c2675f5 in g_type_create_instance () from 
> > /usr/local/lib/libgobject-2.0.so.4200.16
> > #8  0x0c3a19ad in vala_data_type_construct_with_symbol ()
> >from /usr/local/lib/libvala-0.56.so.0.0
> > #9  0x0c3a19b4ecae in vala_integer_type_construct () from 
> > /usr/local/lib/libvala-0.56.so.0.0
> > #10 0x0c3a19b4f08c in vala_integer_type_real_copy () from 
> > /usr/local/lib/libvala-0.56.so.0.0
> > #11 0x0c3a19a9d46f in vala_assignment_real_check () from 
> > /usr/local/lib/libvala-0.56.so.0.0
> > #12 0x0c3a19ae54e5 in vala_expression_statement_real_check ()
> >from /usr/local/lib/libvala-0.56.so.0.0
> > #13 0x0c3a19aa6a0d in vala_block_real_check () from 
> > /usr/local/lib/libvala-0.56.so.0.0



Re: malloc: change chunk sizes to be multiple of 16 instead of power of 2

2023-02-28 Thread Otto Moerbeek
On Wed, Mar 01, 2023 at 08:31:47AM +0100, Theo Buehler wrote:

> On Tue, Feb 28, 2023 at 05:52:28PM +0100, Otto Moerbeek wrote:
> > Second iteration.
> > 
> > Gain back performance by allocation chunk_info pages in a bundle, and
> > use less buckets is !malloc option S. The chunk sizes used are 16, 32,
> > 48, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320, 384, 448, 512, 640,
> > 768, 896, 1024, 1280, 1536, 1792, 2048 (and a few more for sparc84
> > with it's 8k sized pages and loongson with it's 16k pages).
> > 
> > If malloc option S (or rather cache size 0) is used we use strict
> > multiple of 16 buckets, to get as many buckets as possible.
> > 
> > See the find_bucket() and bin_of() functions. Thanks to Tony Finch for
> > pointing me to code to compute nice bucket sizes.
> > 
> > I think this is ready for review and wide testing.
> 
> Two vala-based ports, graphics/birdfont and productivity/minder, run out
> of memory when attempting to build them with this diff (and its previous
> version) on both amd64 and arm64:
> 
> ***MEMORY-ERROR***: valac[93681]: GSlice: failed to allocate 2032 bytes 
> (alignment: 2048): Cannot allocate memory

Thanks, this smells like a bug in the aligned mem case.

+   pof2 = 1 << MALLOC_MINSIZE;

should be 

+   pof2 = MALLOC_MINSIZE;

By the looks of it. I'll get back to this.

-Otto

> 
> Abort trap (core dumped)
> 
> To be able to build birdfont with PORTS_PRIVSEP = Yes, I had to bump
> _pbuild's datasize-cur to 15G, while 14G was not enough. That's nearly
> double the current default. On amd64 without this diff, birdfont builds
> comfortably with a datasize-cur of 1G.
> 
> birdfont may be easier to investigate since the error happens early in
> the build. You can get there relatively quickly by doing
> 
> cd /usr/ports/graphics/birdfont
> doas pkg_add birdfont
> make FETCH_PACKAGES= prepare
> make
> 
> Not sure if the top of the trace is of much use. Here it is:
> 
> #0  thrkill () at /tmp/-:3
> #1  0x486dd8c0aacac468 in ?? ()
> #2  0x0c3a34319d0e in _libc_abort () at 
> /usr/src/lib/libc/stdlib/abort.c:51
> #3  0x0c39b735724b in mem_error () from 
> /usr/local/lib/libglib-2.0.so.4201.9
> #4  0x0c39b735604f in slab_allocator_alloc_chunk () from 
> /usr/local/lib/libglib-2.0.so.4201.9
> #5  0x0c39b7355a95 in g_slice_alloc () from 
> /usr/local/lib/libglib-2.0.so.4201.9
> #6  0x0c39b735606e in g_slice_alloc0 () from 
> /usr/local/lib/libglib-2.0.so.4201.9
> #7  0x0c396c2675f5 in g_type_create_instance () from 
> /usr/local/lib/libgobject-2.0.so.4200.16
> #8  0x0c3a19ad in vala_data_type_construct_with_symbol ()
>from /usr/local/lib/libvala-0.56.so.0.0
> #9  0x0c3a19b4ecae in vala_integer_type_construct () from 
> /usr/local/lib/libvala-0.56.so.0.0
> #10 0x0c3a19b4f08c in vala_integer_type_real_copy () from 
> /usr/local/lib/libvala-0.56.so.0.0
> #11 0x0c3a19a9d46f in vala_assignment_real_check () from 
> /usr/local/lib/libvala-0.56.so.0.0
> #12 0x0c3a19ae54e5 in vala_expression_statement_real_check ()
>from /usr/local/lib/libvala-0.56.so.0.0
> #13 0x0c3a19aa6a0d in vala_block_real_check () from 
> /usr/local/lib/libvala-0.56.so.0.0



Re: malloc: change chunk sizes to be multiple of 16 instead of power of 2

2023-02-28 Thread Theo Buehler
On Tue, Feb 28, 2023 at 05:52:28PM +0100, Otto Moerbeek wrote:
> Second iteration.
> 
> Gain back performance by allocation chunk_info pages in a bundle, and
> use less buckets is !malloc option S. The chunk sizes used are 16, 32,
> 48, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320, 384, 448, 512, 640,
> 768, 896, 1024, 1280, 1536, 1792, 2048 (and a few more for sparc84
> with it's 8k sized pages and loongson with it's 16k pages).
> 
> If malloc option S (or rather cache size 0) is used we use strict
> multiple of 16 buckets, to get as many buckets as possible.
> 
> See the find_bucket() and bin_of() functions. Thanks to Tony Finch for
> pointing me to code to compute nice bucket sizes.
> 
> I think this is ready for review and wide testing.

Two vala-based ports, graphics/birdfont and productivity/minder, run out
of memory when attempting to build them with this diff (and its previous
version) on both amd64 and arm64:

***MEMORY-ERROR***: valac[93681]: GSlice: failed to allocate 2032 bytes 
(alignment: 2048): Cannot allocate memory

Abort trap (core dumped)

To be able to build birdfont with PORTS_PRIVSEP = Yes, I had to bump
_pbuild's datasize-cur to 15G, while 14G was not enough. That's nearly
double the current default. On amd64 without this diff, birdfont builds
comfortably with a datasize-cur of 1G.

birdfont may be easier to investigate since the error happens early in
the build. You can get there relatively quickly by doing

cd /usr/ports/graphics/birdfont
doas pkg_add birdfont
make FETCH_PACKAGES= prepare
make

Not sure if the top of the trace is of much use. Here it is:

#0  thrkill () at /tmp/-:3
#1  0x486dd8c0aacac468 in ?? ()
#2  0x0c3a34319d0e in _libc_abort () at /usr/src/lib/libc/stdlib/abort.c:51
#3  0x0c39b735724b in mem_error () from /usr/local/lib/libglib-2.0.so.4201.9
#4  0x0c39b735604f in slab_allocator_alloc_chunk () from 
/usr/local/lib/libglib-2.0.so.4201.9
#5  0x0c39b7355a95 in g_slice_alloc () from 
/usr/local/lib/libglib-2.0.so.4201.9
#6  0x0c39b735606e in g_slice_alloc0 () from 
/usr/local/lib/libglib-2.0.so.4201.9
#7  0x0c396c2675f5 in g_type_create_instance () from 
/usr/local/lib/libgobject-2.0.so.4200.16
#8  0x0c3a19ad in vala_data_type_construct_with_symbol ()
   from /usr/local/lib/libvala-0.56.so.0.0
#9  0x0c3a19b4ecae in vala_integer_type_construct () from 
/usr/local/lib/libvala-0.56.so.0.0
#10 0x0c3a19b4f08c in vala_integer_type_real_copy () from 
/usr/local/lib/libvala-0.56.so.0.0
#11 0x0c3a19a9d46f in vala_assignment_real_check () from 
/usr/local/lib/libvala-0.56.so.0.0
#12 0x0c3a19ae54e5 in vala_expression_statement_real_check ()
   from /usr/local/lib/libvala-0.56.so.0.0
#13 0x0c3a19aa6a0d in vala_block_real_check () from 
/usr/local/lib/libvala-0.56.so.0.0



Re: update: xf86-video-amdgpu 23.0.0

2023-02-28 Thread Jonathan Gray
On Mon, Feb 27, 2023 at 09:17:51PM +0100, Matthieu Herrb wrote:
> Hi,
> 
> the patch below update the amdgpu(4) X.Org driver to version 23.0.0
> 
> To test (on machines with an AMD / Radeon GPU) 
> 
> # cd /usr/xenocara/driver/xf86-video-amdgpu
> # patch -p0 -E < /path/to/this/file
> # make -f Makefile.bsd-wrapper obj
> # make -f Makefile.bsd-wrapper build
> 
> and restart xenodm(1)
> 
> Tests and oks welcome...

tested on:

amd64, t14 gen 1 amd, builtin eDP and external DP
amdgpu0: RENOIR GC 9.3.0 6 CU rev 0x00

amd64, t495, builtin eDP
amdgpu0: PICASSO GC 9.1.0 10 CU rev 0x01

amd64, vega56, DP to DVI
amdgpu0: VEGA10 GC 9.0.1 56 CU rev 0x01

ok jsg@



Re: malloc: change chunk sizes to be multiple of 16 instead of power of 2

2023-02-28 Thread Otto Moerbeek
Second iteration.

Gain back performance by allocation chunk_info pages in a bundle, and
use less buckets is !malloc option S. The chunk sizes used are 16, 32,
48, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320, 384, 448, 512, 640,
768, 896, 1024, 1280, 1536, 1792, 2048 (and a few more for sparc84
with it's 8k sized pages and loongson with it's 16k pages).

If malloc option S (or rather cache size 0) is used we use strict
multiple of 16 buckets, to get as many buckets as possible.

See the find_bucket() and bin_of() functions. Thanks to Tony Finch for
pointing me to code to compute nice bucket sizes.

I think this is ready for review and wide testing.

-Otto

Index: stdlib/malloc.c
===
RCS file: /home/cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.277
diff -u -p -r1.277 malloc.c
--- stdlib/malloc.c 27 Feb 2023 06:47:54 -  1.277
+++ stdlib/malloc.c 28 Feb 2023 16:49:08 -
@@ -67,6 +67,11 @@
 #define MALLOC_CHUNK_LISTS 4
 #define CHUNK_CHECK_LENGTH 32
 
+#define B2SIZE(b)  ((b) * MALLOC_MINSIZE)
+#define B2ALLOC(b) ((b) == 0 ? MALLOC_MINSIZE : \
+   (b) * MALLOC_MINSIZE)
+#define BUCKETS(MALLOC_MAXCHUNK / MALLOC_MINSIZE)
+
 /*
  * We move allocations between half a page and a whole page towards the end,
  * subject to alignment constraints. This is the extra headroom we allow.
@@ -144,9 +149,9 @@ struct dir_info {
int mutex;
int malloc_mt;  /* multi-threaded mode? */
/* lists of free chunk info structs */
-   struct chunk_head chunk_info_list[MALLOC_MAXSHIFT + 1];
+   struct chunk_head chunk_info_list[BUCKETS + 1];
/* lists of chunks with free slots */
-   struct chunk_head chunk_dir[MALLOC_MAXSHIFT + 1][MALLOC_CHUNK_LISTS];
+   struct chunk_head chunk_dir[BUCKETS + 1][MALLOC_CHUNK_LISTS];
/* delayed free chunk slots */
void *delayed_chunks[MALLOC_DELAYED_CHUNK_MASK + 1];
u_char rbytes[32];  /* random bytes */
@@ -155,6 +160,8 @@ struct dir_info {
size_t bigcache_used;
size_t bigcache_size;
struct bigcache *bigcache;
+   void *chunk_pages;
+   size_t chunk_pages_used;
 #ifdef MALLOC_STATS
size_t inserts;
size_t insert_collisions;
@@ -195,8 +202,7 @@ struct chunk_info {
LIST_ENTRY(chunk_info) entries;
void *page; /* pointer to the page */
u_short canary;
-   u_short size;   /* size of this page's chunks */
-   u_short shift;  /* how far to shift for this size */
+   u_short bucket;
u_short free;   /* how many free chunks */
u_short total;  /* how many chunks */
u_short offset; /* requested size table offset */
@@ -247,11 +253,11 @@ static void malloc_exit(void);
 #endif
 
 /* low bits of r->p determine size: 0 means >= page size and r->size holding
- * real size, otherwise low bits are a shift count, or 1 for malloc(0)
+ * real size, otherwise low bits is the bucket + 1
  */
 #define REALSIZE(sz, r)\
(sz) = (uintptr_t)(r)->p & MALLOC_PAGEMASK, \
-   (sz) = ((sz) == 0 ? (r)->size : ((sz) == 1 ? 0 : (1 << ((sz)-1
+   (sz) = ((sz) == 0 ? (r)->size : B2SIZE((sz) - 1))
 
 static inline void
 _MALLOC_LEAVE(struct dir_info *d)
@@ -502,7 +508,7 @@ omalloc_poolinit(struct dir_info *d, int
d->r = NULL;
d->rbytesused = sizeof(d->rbytes);
d->regions_free = d->regions_total = 0;
-   for (i = 0; i <= MALLOC_MAXSHIFT; i++) {
+   for (i = 0; i <= BUCKETS; i++) {
LIST_INIT(>chunk_info_list[i]);
for (j = 0; j < MALLOC_CHUNK_LISTS; j++)
LIST_INIT(>chunk_dir[i][j]);
@@ -720,7 +726,7 @@ unmap(struct dir_info *d, void *p, size_
 
/* don't look through all slots */
for (j = 0; j < d->bigcache_size / 4; j++) {
-   i = (base + j) % d->bigcache_size;
+   i = (base + j) & (d->bigcache_size - 1);
if (d->bigcache_used <
BIGCACHE_FILL(d->bigcache_size))  {
if (d->bigcache[i].psize == 0)
@@ -764,10 +770,13 @@ unmap(struct dir_info *d, void *p, size_
}
cache = >smallcache[psz - 1];
if (cache->length == cache->max) {
+   int fresh;
/* use a random slot */
-   i = getrbyte(d) % cache->max;
+   i = getrbyte(d) & (cache->max - 1);
r = cache->pages[i];
-   if (!mopts.malloc_freeunmap)
+   fresh = (uintptr_t)r & 1;
+   

Re: pf(4) drops valid IGMP/MLD messages

2023-02-28 Thread Alexandr Nedvedicky
Hello Matthieu,



On Tue, Feb 28, 2023 at 02:05:50PM +0100, Matthieu Herrb wrote:
> > --- a/sys/net/pf.c
> > +++ b/sys/net/pf.c
> > @@ -6846,8 +6846,12 @@ pf_walk_header(struct pf_pdesc *pd, struct ip *h, 
> > u_short *reason)
> > pd->proto = h->ip_p;
> > /* IGMP packets have router alert options, allow them */
> > if (pd->proto == IPPROTO_IGMP) {
> > -   /* According to RFC 1112 ttl must be set to 1. */
> > -   if ((h->ip_ttl != 1) || !IN_MULTICAST(h->ip_dst.s_addr)) {
> > +   /*
> > +* According to RFC 1112 ttl must be set to 1 in all IGMP
> > +* packets sent do 224.0.0.1
> > +*/
> > +   if ((h->ip_ttl != 1) &&
> > +   (h->ip_dst.s_addr == INADDR_ALLHOSTS_GROUP)) {
> > DPFPRINTF(LOG_NOTICE, "Invalid IGMP");
> > REASON_SET(reason, PFRES_IPOPTIONS);
> > return (PF_DROP);
> > 
> 
> 
> Hi,
> 
> The expression look wrong to me again.
> I read in RFC1112 that correct packets should have:
> 
>   (h->ip_ttl == 1 && h->ip.ip_dst.s_addr == INADDR_ALLHOST_GROUP)

the expression above is true for for Query/Report messages
which are a sub-set of IGMP protocol. See Luca's emails with
references to RFCs which further extend IGMP protocol.

here in this particular place (pf_walk_header()) I think we really
want to simply discard all IGMP datagrams sent to 224.0.0.1 with TTL
different to 1.  anything else should be passed further up and let pf(4)
rules to make a decision on those cases.

thanks and
regards
sashan



Re: pf(4) drops valid IGMP/MLD messages

2023-02-28 Thread Luca Di Gregorio
Hi Matthieu,

DVMRP messages are sent with IGMP protocol. Some Multicast Control messages
(Query, Report) have an IP Destination Address belonging to 224.0.0.0/4,
with TTL=1.
Other DVMRP multicast control messages (Prune, Graft, Graft-Ack) have an IP
Destination Address = Unicast Address of the adjacent multicast router,
with TTL >1.

After revision 1.1128, PF drops IGMP messages with TTL != 1 or IP
Destination Address ! IN_MULTICAST.

I think that your proposed test to drop invalid IGMP is too restrictive.
In fact, it would block not only Prune, Graft, Graft-Ack messages (the  IP
Destination Address is Unicast and TTL > 1), but also:

 - DVMRP Probe and Report (TTL = 1 but IP Destination Address = 224.0.0.4 -
DVMRP Routers)
 - Multicast Control messages sent to 224.0.0.2 (All Routers on this Subnet)
 - Multicast Control messages sent to 224.0.0.22  (IGMP)

For reference:
https://www.iana.org/assignments/multicast-addresses/multicast-addresses.xhtml

Regards
Luca



Il giorno mar 28 feb 2023 alle ore 14:05 Matthieu Herrb 
ha scritto:

> On Mon, Feb 27, 2023 at 12:04:54AM +0100, Alexandr Nedvedicky wrote:
> > Hello,
> >
> > 
> > > >
> 8<---8<---8<--8<
> > > > diff --git a/sys/net/pf.c b/sys/net/pf.c
> > > > index 8cb1326a160..c328109026c 100644
> > > > --- a/sys/net/pf.c
> > > > +++ b/sys/net/pf.c
> > > > @@ -6847,7 +6847,7 @@ pf_walk_header(struct pf_pdesc *pd, struct ip
> *h, u_short *reason)
> > > >   /* IGMP packets have router alert options, allow them */
> > > >   if (pd->proto == IPPROTO_IGMP) {
> > > >   /* According to RFC 1112 ttl must be set to 1. */
> > > > - if ((h->ip_ttl != 1) || !IN_MULTICAST(h->ip_dst.s_addr)) {
> > > > + if ((h->ip_ttl != 1) && IN_MULTICAST(h->ip_dst.s_addr)) {
> > > >   DPFPRINTF(LOG_NOTICE, "Invalid IGMP");
> > > >   REASON_SET(reason, PFRES_IPOPTIONS);
> > > >   return (PF_DROP);
> > > > @@ -7101,8 +7101,8 @@ pf_walk_header6(struct pf_pdesc *pd, struct
> ip6_hdr *h, u_short *reason)
> > > >* missing then MLD message is invalid and
> > > >* should be discarded.
> > > >*/
> > > > - if ((h->ip6_hlim != 1) ||
> > > > - !IN6_IS_ADDR_LINKLOCAL(>ip6_src)) {
> > > > + if ((h->ip6_hlim != 1) &&
> > > > + IN6_IS_ADDR_LINKLOCAL(>ip6_src)) {
> > > >   DPFPRINTF(LOG_NOTICE, "Invalid
> MLD");
> > > >   REASON_SET(reason,
> PFRES_IPOPTIONS);
> > > >   return (PF_DROP);
> > > >
> > >
> > > Unless I'm missing more context, this hunk looks wrong to me. Valid
> > > MLD packets must have a ttl of 1 *and* come from a LL address. The
> > > initial logic seems ok to me.
> > >
> >
> > yes you are right. Your comment made me to take better look
> > at RFC 1112 [1]. Section 'Informal Protocol Description'
> > reads as follows:
> >
> >Multicast routers send Host Membership Query messages (hereinafter
> >called Queries) to discover which host groups have members on
> their
> >attached local networks.  Queries are addressed to the all-hosts
> >group (address 224.0.0.1), and carry an IP time-to-live of 1.
> >
> > I think I've confused all-hosts group (224.0.0.1) with any multicast
> > address (any class-D 224.0.0.0). I think the diff below is what we
> > actually need  to get IPv4 IGMP working again:
> >
> > [1] https://www.ietf.org/rfc/rfc1112.txt
> >
> > 8<---8<---8<--8<
> > diff --git a/sys/net/pf.c b/sys/net/pf.c
> > index 8cb1326a160..c50173186da 100644
> > --- a/sys/net/pf.c
> > +++ b/sys/net/pf.c
> > @@ -6846,8 +6846,12 @@ pf_walk_header(struct pf_pdesc *pd, struct ip *h,
> u_short *reason)
> >   pd->proto = h->ip_p;
> >   /* IGMP packets have router alert options, allow them */
> >   if (pd->proto == IPPROTO_IGMP) {
> > - /* According to RFC 1112 ttl must be set to 1. */
> > - if ((h->ip_ttl != 1) || !IN_MULTICAST(h->ip_dst.s_addr)) {
> > + /*
> > +  * According to RFC 1112 ttl must be set to 1 in all IGMP
> > +  * packets sent do 224.0.0.1
> > +  */
> > + if ((h->ip_ttl != 1) &&
> > + (h->ip_dst.s_addr == INADDR_ALLHOSTS_GROUP)) {
> >   DPFPRINTF(LOG_NOTICE, "Invalid IGMP");
> >   REASON_SET(reason, PFRES_IPOPTIONS);
> >   return (PF_DROP);
> >
>
>
> Hi,
>
> The expression look wrong to me again.
> I read in RFC1112 that correct packets should have:
>
>   (h->ip_ttl == 1 && h->ip.ip_dst.s_addr == INADDR_ALLHOST_GROUP)
>
> so the test to drop invalid IGMP should be:
>
>if (h->ip_ttl != 1 || 

[patch] usr.sbin/smtpd filter localhost relays

2023-02-28 Thread Philipp
Hi

On github someone reported an issue[0] regarding localhost MX entries.
Currently smtpd will just use the localhost relay. This leads to a
loop. Here a patch filtering localhost and localhost addresses for MX
requests.

As next step you could implement Null-MX (rfc 7505).

Philipp

[0] https://github.com/OpenSMTPD/OpenSMTPD/issues/1190

diff --git a/usr.sbin/smtpd/dns.c b/usr.sbin/smtpd/dns.c
index f7c6b3df..7389efec 100644
--- a/usr.sbin/smtpd/dns.c
+++ b/usr.sbin/smtpd/dns.c
@@ -53,6 +53,7 @@ struct dns_lookup {
struct dns_session  *session;
char*host;
int  preference;
+   int  filter_localhost;
 };
 
 struct dns_session {
@@ -65,7 +66,7 @@ struct dns_session {
int  refcount;
 };
 
-static void dns_lookup_host(struct dns_session *, const char *, int);
+static void dns_lookup_host(struct dns_session *, const char *, int, int);
 static void dns_dispatch_host(struct asr_result *, void *);
 static void dns_dispatch_mx(struct asr_result *, void *);
 static void dns_dispatch_mx_preference(struct asr_result *, void *);
@@ -139,7 +140,7 @@ dns_imsg(struct mproc *p, struct imsg *imsg)
case IMSG_MTA_DNS_HOST:
m_get_string(, );
m_end();
-   dns_lookup_host(s, host, -1);
+   dns_lookup_host(s, host, -1, 0);
return;
 
case IMSG_MTA_DNS_MX:
@@ -205,6 +206,28 @@ dns_imsg(struct mproc *p, struct imsg *imsg)
}
 }
 
+static int
+is_localhost(struct sockaddr *sa)
+{
+   struct sockaddr_in6 *ipv6;
+   struct sockaddr_in *ipv4;
+   uint32_t addr;
+
+   switch (sa->sa_family) {
+   case AF_INET6:
+   // May check also for v6 mapped v4 addresses
+   ipv6 = (struct sockaddr_in6 *)sa;
+   return IN6_IS_ADDR_LOOPBACK(>sin6_addr);
+   case AF_INET:
+   ipv4 = (struct sockaddr_in *)sa;
+   addr = ntohl(ipv4->sin_addr.s_addr);
+   return ((addr >> 24) & 0xff) == 127;
+   default:
+   log_warnx("warn: unknown family in sockaddr");
+   }
+   return 0;
+}
+
 static void
 dns_dispatch_host(struct asr_result *ar, void *arg)
 {
@@ -215,6 +238,10 @@ dns_dispatch_host(struct asr_result *ar, void *arg)
s = lookup->session;
 
for (ai = ar->ar_addrinfo; ai; ai = ai->ai_next) {
+   if (lookup->filter_localhost && is_localhost(ai->ai_addr)) {
+   log_warnx("warn: ignore localhost address for host %s", 
lookup->host);
+   continue;
+   }
s->mxfound++;
m_create(s->p, IMSG_MTA_DNS_HOST, 0, 0, -1);
m_add_id(s->p, s->reqid);
@@ -280,14 +307,18 @@ dns_dispatch_mx(struct asr_result *ar, void *arg)
continue;
print_dname(rr.rr.mx.exchange, buf, sizeof(buf));
buf[strlen(buf) - 1] = '\0';
-   dns_lookup_host(s, buf, rr.rr.mx.preference);
+   if (strcasecmp("localhost", buf)==0) {
+   log_warnx("ignore localhost MX-entry for domain <%s>", 
lookup->host);
+   continue;
+   }
+   dns_lookup_host(s, buf, rr.rr.mx.preference, 1);
found++;
}
free(ar->ar_data);
 
/* fallback to host if no MX is found. */
if (found == 0)
-   dns_lookup_host(s, s->name, 0);
+   dns_lookup_host(s, s->name, 0, 1);
 }
 
 static void
@@ -340,7 +371,7 @@ dns_dispatch_mx_preference(struct asr_result *ar, void *arg)
 }
 
 static void
-dns_lookup_host(struct dns_session *s, const char *host, int preference)
+dns_lookup_host(struct dns_session *s, const char *host, int preference, int 
filter_localhost)
 {
struct dns_lookup   *lookup;
struct addrinfo  hints;
@@ -350,6 +381,7 @@ dns_lookup_host(struct dns_session *s, const char *host, 
int preference)
 
lookup = xcalloc(1, sizeof *lookup);
lookup->preference = preference;
+   lookup->filter_localhost = filter_localhost;
lookup->host = xstrdup(host);
lookup->session = s;
s->refcount++;



Re: pf(4) drops valid IGMP/MLD messages

2023-02-28 Thread Matthieu Herrb
On Mon, Feb 27, 2023 at 12:04:54AM +0100, Alexandr Nedvedicky wrote:
> Hello,
> 
> 
> > > 8<---8<---8<--8<
> > > diff --git a/sys/net/pf.c b/sys/net/pf.c
> > > index 8cb1326a160..c328109026c 100644
> > > --- a/sys/net/pf.c
> > > +++ b/sys/net/pf.c
> > > @@ -6847,7 +6847,7 @@ pf_walk_header(struct pf_pdesc *pd, struct ip *h, 
> > > u_short *reason)
> > >   /* IGMP packets have router alert options, allow them */
> > >   if (pd->proto == IPPROTO_IGMP) {
> > >   /* According to RFC 1112 ttl must be set to 1. */
> > > - if ((h->ip_ttl != 1) || !IN_MULTICAST(h->ip_dst.s_addr)) {
> > > + if ((h->ip_ttl != 1) && IN_MULTICAST(h->ip_dst.s_addr)) {
> > >   DPFPRINTF(LOG_NOTICE, "Invalid IGMP");
> > >   REASON_SET(reason, PFRES_IPOPTIONS);
> > >   return (PF_DROP);
> > > @@ -7101,8 +7101,8 @@ pf_walk_header6(struct pf_pdesc *pd, struct ip6_hdr 
> > > *h, u_short *reason)
> > >* missing then MLD message is invalid and
> > >* should be discarded.
> > >*/
> > > - if ((h->ip6_hlim != 1) ||
> > > - !IN6_IS_ADDR_LINKLOCAL(>ip6_src)) {
> > > + if ((h->ip6_hlim != 1) &&
> > > + IN6_IS_ADDR_LINKLOCAL(>ip6_src)) {
> > >   DPFPRINTF(LOG_NOTICE, "Invalid MLD");
> > >   REASON_SET(reason, PFRES_IPOPTIONS);
> > >   return (PF_DROP);
> > > 
> > 
> > Unless I'm missing more context, this hunk looks wrong to me. Valid
> > MLD packets must have a ttl of 1 *and* come from a LL address. The
> > initial logic seems ok to me.
> > 
> 
> yes you are right. Your comment made me to take better look
> at RFC 1112 [1]. Section 'Informal Protocol Description'
> reads as follows:
> 
>Multicast routers send Host Membership Query messages (hereinafter
>called Queries) to discover which host groups have members on their
>attached local networks.  Queries are addressed to the all-hosts
>group (address 224.0.0.1), and carry an IP time-to-live of 1.
> 
> I think I've confused all-hosts group (224.0.0.1) with any multicast
> address (any class-D 224.0.0.0). I think the diff below is what we
> actually need  to get IPv4 IGMP working again:
> 
> [1] https://www.ietf.org/rfc/rfc1112.txt
> 
> 8<---8<---8<--8<
> diff --git a/sys/net/pf.c b/sys/net/pf.c
> index 8cb1326a160..c50173186da 100644
> --- a/sys/net/pf.c
> +++ b/sys/net/pf.c
> @@ -6846,8 +6846,12 @@ pf_walk_header(struct pf_pdesc *pd, struct ip *h, 
> u_short *reason)
>   pd->proto = h->ip_p;
>   /* IGMP packets have router alert options, allow them */
>   if (pd->proto == IPPROTO_IGMP) {
> - /* According to RFC 1112 ttl must be set to 1. */
> - if ((h->ip_ttl != 1) || !IN_MULTICAST(h->ip_dst.s_addr)) {
> + /*
> +  * According to RFC 1112 ttl must be set to 1 in all IGMP
> +  * packets sent do 224.0.0.1
> +  */
> + if ((h->ip_ttl != 1) &&
> + (h->ip_dst.s_addr == INADDR_ALLHOSTS_GROUP)) {
>   DPFPRINTF(LOG_NOTICE, "Invalid IGMP");
>   REASON_SET(reason, PFRES_IPOPTIONS);
>   return (PF_DROP);
> 


Hi,

The expression look wrong to me again.
I read in RFC1112 that correct packets should have:

  (h->ip_ttl == 1 && h->ip.ip_dst.s_addr == INADDR_ALLHOST_GROUP)

so the test to drop invalid IGMP should be:

   if (h->ip_ttl != 1 || h->ip.ip_dst.s_addr != INADDR_ALLHOST_GROUP) {
   ...
   return (PF_DROP
   }

Again I may be missing something
-- 
Matthieu Herrb



bgpd, improve RFC9234 support

2023-02-28 Thread Claudio Jeker
When I implemented RFC9234 support I was a bit to conservative and only
enabled the loop detection if the capability was enabled. This is
how every other capability works but RFC9234 is special.
On top of this with the addition of ASPA support the way BGP roles are
handled changed and the code turned up a bit strange.
Considering all of this and rereading RFC9234 I came up with this diff:

- add role output to bgpctl, also adjust the capability output.
  Note, this changes the JSON output of neighbors a bit.
- adjust the config parser to enable the RFC9234 role capability when
  there is a role set. iBGP and sessions with no role will not announce
  the role capability.
- adjust the role capability announcement to be only on sessions that
  use either AFI IPv4 or IPv6 and SAFI 1 (AID_INET, AID_INET6). If a
  session has neither of the two then do not announce the role (even if
  set)
- if there is an OPEN notification indicating that the role capability
  is bad only disable the capability if it is not enforced. If enforced
  do nothing and let the session fail (again and again).
- Adjust capability negotiation, store remote_role on the peer since
  the neighbors role is no longer needed.
- inject the OTC attribute on ingress only for AID_INET and AID_INET6.
  For other AIDs clear the F_ATTR_OTC_LOOP flag since OTC loop detection
  must be skipped for other address families.
- Adjust the role logic in the RDE and use the peer->role (local role of
  the system) for all checks. Also remove the check if the role capability
  was present (removal of peer_has_open_policy()).
  This change switches all role checks from raw capability codes to
  enum role and at the same time flips the logic since the view of the
  checks is changed from remote system to local system.
  The roles change like this: rs <-> rs-client, customer <-> provider,
  peer <-> peer
- In prefix_eligible() check also if the F_ATTR_OTC_LOOP flag is set.
  The RFC requires that prefixes must be considered ineligible (and not
  treat as withdraw)
- When generating an UPDATE include the OTC attribute unless the AID is
  neither AID_INET or AID_INET6 or the roles do not require the addtion.
  No longer depend on peer_has_open_policy() there.

Please test and check that my logic is actually correct and following the
RFC.
-- 
:wq Claudio

Index: usr.sbin/bgpctl/output.c
===
RCS file: /cvs/src/usr.sbin/bgpctl/output.c,v
retrieving revision 1.36
diff -u -p -r1.36 output.c
--- usr.sbin/bgpctl/output.c31 Jan 2023 14:32:43 -  1.36
+++ usr.sbin/bgpctl/output.c28 Feb 2023 05:46:41 -
@@ -281,6 +281,8 @@ show_neighbor_full(struct peer *p, struc
printf("\n");
if (p->conf.descr[0])
printf(" Description: %s\n", p->conf.descr);
+   if (p->conf.ebgp && p->conf.role != ROLE_NONE)
+   printf(" Role: %s\n", log_policy(p->conf.role));
if (p->conf.max_prefix) {
printf(" Max-prefix: %u", p->conf.max_prefix);
if (p->conf.max_prefix_restart)
@@ -330,7 +332,7 @@ show_neighbor_full(struct peer *p, struc
}
if (hascapamp || hascapaap || p->capa.peer.grestart.restart ||
p->capa.peer.refresh || p->capa.peer.enhanced_rr ||
-   p->capa.peer.as4byte || p->capa.peer.role_ena) {
+   p->capa.peer.as4byte || p->capa.peer.policy) {
printf("  Neighbor capabilities:\n");
if (hascapamp)
show_neighbor_capa_mp(>capa.peer);
@@ -344,10 +346,10 @@ show_neighbor_full(struct peer *p, struc
show_neighbor_capa_restart(>capa.peer);
if (hascapaap)
show_neighbor_capa_add_path(>capa.peer);
-   if (p->capa.peer.role_ena)
+   if (p->capa.peer.policy)
printf("Open Policy role %s (local %s)\n",
-   log_policy(p->capa.peer.role),
-   log_policy(p->capa.ann.role));
+   log_policy(p->remote_role),
+   log_policy(p->conf.role));
}
 
hascapamp = 0;
@@ -360,7 +362,7 @@ show_neighbor_full(struct peer *p, struc
}
if (hascapamp || hascapaap || p->capa.neg.grestart.restart ||
p->capa.neg.refresh || p->capa.neg.enhanced_rr ||
-   p->capa.neg.as4byte || p->capa.neg.role_ena) {
+   p->capa.neg.as4byte || p->capa.neg.policy) {
printf("  Negotiated capabilities:\n");
if (hascapamp)
show_neighbor_capa_mp(>capa.neg);
@@ -374,10 +376,10 @@ show_neighbor_full(struct peer *p, struc
show_neighbor_capa_restart(>capa.neg);
if (hascapaap)
show_neighbor_capa_add_path(>capa.neg);
-   if (p->capa.neg.role_ena)
+   if (p->capa.neg.policy)