Re: stop using mquery(2) inside realloc(3)

2022-06-28 Thread Theo de Raadt
This diff to stop using mquery has been in snaps for a few weeks, and I
finally noticed some fallout.  But it is amusing fallout: If you have a
snap with the diff installed, you cannot build an install media in a
shortcut way, since malloc.c in the real tree uses mquery, but the
installed libc doesn't use mquery.  So I need to build a -rHEAD real
libc, first.

Anyways, I think the diff received enough testing to be commited.



Re: stop using mquery(2) inside realloc(3)

2022-05-14 Thread Philip Guenther
On Sat, 14 May 2022, Philip Guenther wrote:
> On Sat, 14 May 2022, Theo de Raadt wrote:
> > I worry a little about having libc use an undocumented mmap(2) flag.
> > About as much as using mquery, which is non-standard.
> > 
> > Maybe __MAP_NOREPLACE should get documentation?  __MAP_NOFAULT is in the
> > same situation.  The behaviour of these flags should be documented
> > (set in stone), which may also discourage accidental behaviour changes
> > by kernel developers in the future?
> 
> To throw some spaghetti at the wall...

Fix the grammar of the __MAP_NOFAULT description and mention signals there.

Index: sys/mmap.2
===
RCS file: /data/src/openbsd/src/lib/libc/sys/mmap.2,v
retrieving revision 1.68
diff -u -p -r1.68 mmap.2
--- sys/mmap.2  31 Mar 2022 17:27:16 -  1.68
+++ sys/mmap.2  14 May 2022 19:31:28 -
@@ -58,8 +58,19 @@ The
 argument describes the address where the system should place the mapping.
 If the
 .Dv MAP_FIXED
-flag is specified, the allocation will happen at the specified address,
+flag is specified and the
+.Dv __MAP_NOREPLACE
+flag is not specified,
+the allocation will happen at the specified address,
 replacing any previously established mappings in its range.
+If both the
+.Dv MAP_FIXED
+and
+.Dv __MAP_NOREPLACE
+flags are specified,
+the allocation will happen at the specified address,
+failing with no effect if any previously established mappings are
+in its range.
 Otherwise, the mapping will be placed at the available spot at
 .Fa addr ;
 failing that it will be placed "close by".
@@ -153,6 +164,18 @@ mappings)
 must be multiples of the page size.
 Existing mappings in the address range will be replaced.
 Use of this option is discouraged.
+.It Dv __MAP_NOFAULT
+Indicate that access to pages that are not backed by the mapped
+file or device will result in zero-filled anonymous pages being
+provided instead of a signal being delivered.
+This flag must not be used with an anonymous mapping.
+.It Dv __MAP_NOREPLACE
+Indicates that a
+.Dv MAP_FIXED
+mapping should fail if it would require replacing any existing
+mappings.
+This flag must be used in combination with
+.Dv MAP_FIXED .
 .It Dv MAP_STACK
 Indicate that the mapping is used as a stack.
 This flag must be used in combination with
@@ -278,6 +301,12 @@ device does not support memory mapping.
 The allocation
 .Fa len
 was 0.
+.It Bq Er EINVAL
+.Dv __MAP_NOFAULT
+was specified for an anonymous mapping or
+.Dv __MAP_NOREPLACE
+was specified without
+.Dv MAP_FIXED .
 .It Bq Er ENOMEM
 .Dv MAP_FIXED
 was specified and the
@@ -323,6 +352,14 @@ A fully functional
 system call first appeared in SunOS 4.0
 and has been available since
 .Bx 4.3 Net/2 .
+The
+.Dv __MAP_NOFAULT
+flag appeared in
+.Ox 5.7 .
+The
+.Dv __MAP_NOREPLACE
+flag appeared in
+.Ox 5.3 .
 .Sh CAVEATS
 .St -p1003.1-2008
 specifies that references to pages beyond the end of a mapped object



Re: stop using mquery(2) inside realloc(3)

2022-05-14 Thread Philip Guenther
On Sat, 14 May 2022, Theo de Raadt wrote:
> I worry a little about having libc use an undocumented mmap(2) flag.
> About as much as using mquery, which is non-standard.
> 
> Maybe __MAP_NOREPLACE should get documentation?  __MAP_NOFAULT is in the
> same situation.  The behaviour of these flags should be documented
> (set in stone), which may also discourage accidental behaviour changes
> by kernel developers in the future?

To throw some spaghetti at the wall...

Index: sys/mmap.2
===
RCS file: /data/src/openbsd/src/lib/libc/sys/mmap.2,v
retrieving revision 1.68
diff -u -p -r1.68 mmap.2
--- sys/mmap.2  31 Mar 2022 17:27:16 -  1.68
+++ sys/mmap.2  14 May 2022 19:06:00 -
@@ -58,8 +58,19 @@ The
 argument describes the address where the system should place the mapping.
 If the
 .Dv MAP_FIXED
-flag is specified, the allocation will happen at the specified address,
+flag is specified and the
+.Dv __MAP_NOREPLACE
+flag is not specified,
+the allocation will happen at the specified address,
 replacing any previously established mappings in its range.
+If both the
+.Dv MAP_FIXED
+and
+.Dv __MAP_NOREPLACE
+flags are specified,
+the allocation will happen at the specified address,
+failing with no effect if any previously established mappings are
+in its range.
 Otherwise, the mapping will be placed at the available spot at
 .Fa addr ;
 failing that it will be placed "close by".
@@ -153,6 +164,17 @@ mappings)
 must be multiples of the page size.
 Existing mappings in the address range will be replaced.
 Use of this option is discouraged.
+.It Dv __MAP_NOFAULT
+Indicate that access to pages that are not backed by the mapped
+file or device will be replaced by zero-filled anonymous pages.
+This flag must not be used with an anonymous mapping.
+.It Dv __MAP_NOREPLACE
+Indicates that a
+.Dv MAP_FIXED
+mapping should fail if it would require replacing any existing
+mappings.
+This flag must be used in combination with
+.Dv MAP_FIXED .
 .It Dv MAP_STACK
 Indicate that the mapping is used as a stack.
 This flag must be used in combination with
@@ -278,6 +300,12 @@ device does not support memory mapping.
 The allocation
 .Fa len
 was 0.
+.It Bq Er EINVAL
+.Dv __MAP_NOFAULT
+was specified for an anonymous mapping or
+.Dv __MAP_NOREPLACE
+was specified without
+.Dv MAP_FIXED .
 .It Bq Er ENOMEM
 .Dv MAP_FIXED
 was specified and the
@@ -323,6 +351,14 @@ A fully functional
 system call first appeared in SunOS 4.0
 and has been available since
 .Bx 4.3 Net/2 .
+The
+.Dv __MAP_NOFAULT
+flag appeared in
+.Ox 5.7 .
+The
+.Dv __MAP_NOREPLACE
+flag appeared in
+.Ox 5.3 .
 .Sh CAVEATS
 .St -p1003.1-2008
 specifies that references to pages beyond the end of a mapped object



Re: stop using mquery(2) inside realloc(3)

2022-05-14 Thread Theo de Raadt
I worry a little about having libc use an undocumented mmap(2) flag.
About as much as using mquery, which is non-standard.

Maybe __MAP_NOREPLACE should get documentation?  __MAP_NOFAULT is in the
same situation.  The behaviour of these flags should be documented
(set in stone), which may also discourage accidental behaviour changes
by kernel developers in the future?



Re: stop using mquery(2) inside realloc(3)

2022-05-14 Thread Otto Moerbeek
On Fri, May 13, 2022 at 08:21:19PM -0700, Philip Guenther wrote:

> If you try to grow a 'large' (at least half a page) allocation with 
> realloc(3), it'll try to extend the memory mapping for it and if that 
> works it won't need to move it.

(on a side note: it does not do anything at all if the # of pages does
not grow).
> 
> Currently, it does that by using mquery(2) to see if that area is 
> available and if so then trying to mmap it, munmaping it if mmap(2) didn't 
> return the desired address (perhaps due to a race with another thread).  
> Instead of doing mquery/mmap/munmap, this path can use the MAP_FIXED and 
> __MAP_NOREPLACE flags to directly request the specific address but not 
> change anything if it's not available.
> 
> 
> (We still use mquery in ld.so on i386 as a performance optimization, but 
> that use case differs in needing to find a base address for *multiple* 
> mappings, where unrolling partial hits grows to be more expensive than 
> probing with mquery and only trying to do all the mapping on a successful 
> set of probes: recall that on x86 munmap() is more expensive than mmap() 
> due to TLB shootdowns. This case in realloc() only has a single mapping to 
> probe/extend so it avoids those costs.  Indeed, this diff eliminates the 
> current "munmap on failed attempt" path/cost.)
> 
> 
> The regress/lib/libc/malloc tests continue to pass on my amd64 box, with 
> ktrace confirming the change in calls.
> 
> oks?

I was looking through the commit log to see when this flag was introduced.
It was in sys/mman.h rev 1.21, but a bit hidden becuase the commit
message talks about __MAP_NOREMAP.

Anyway, I cannot oversee if it is relevant if all pmap implementations
implement this or if this is done in a higher layer. 

Also a tiny bit worried the __MAP_NOREPLACE path in uvm isn't
exercised as much, so it might reveal bugs. But thats no reason not to
do this now.

So if my first question can be answered with: it's in a higher layer,
OK.

-Otto

> 
> 
> Philip
> 
> 
> Index: stdlib/malloc.c
> ===
> RCS file: /data/src/openbsd/src/lib/libc/stdlib/malloc.c,v
> retrieving revision 1.273
> diff -u -p -r1.273 malloc.c
> --- stdlib/malloc.c   26 Feb 2022 16:14:42 -  1.273
> +++ stdlib/malloc.c   14 May 2022 02:35:04 -
> @@ -100,9 +100,6 @@
>  #define MMAPA(a,sz,f)mmap((a), (sz), PROT_READ | PROT_WRITE, \
>  MAP_ANON | MAP_PRIVATE | (f), -1, 0)
>  
> -#define MQUERY(a,sz,f)   mquery((a), (sz), PROT_READ | PROT_WRITE, \
> -MAP_ANON | MAP_PRIVATE | MAP_FIXED | (f), -1, 0)
> -
>  struct region_info {
>   void *p;/* page; low bits used to mark chunks */
>   uintptr_t size; /* size for pages, or chunk_info pointer */
> @@ -1687,11 +1684,7 @@ orealloc(struct dir_info **argpool, void
>   size_t needed = rnewsz - roldsz;
>  
>   STATS_INC(pool->cheap_realloc_tries);
> - q = MQUERY(hint, needed, pool->mmap_flag);
> - if (q == hint)
> - q = MMAPA(hint, needed, 
> pool->mmap_flag);
> - else
> - q = MAP_FAILED;
> + q = MMAPA(hint, needed, MAP_FIXED | 
> __MAP_NOREPLACE | pool->mmap_flag);
>   if (q == hint) {
>   STATS_ADD(pool->malloc_used, needed);
>   if (pool->malloc_junk == 2)
> @@ -1709,9 +1702,6 @@ orealloc(struct dir_info **argpool, void
>   STATS_INC(pool->cheap_reallocs);
>   ret = p;
>   goto done;
> - } else if (q != MAP_FAILED) {
> - if (munmap(q, needed))
> - wrterror(pool, "munmap %p", q);
>   }
>   }
>   } else if (rnewsz < roldsz) {
> 



stop using mquery(2) inside realloc(3)

2022-05-13 Thread Philip Guenther
If you try to grow a 'large' (at least half a page) allocation with 
realloc(3), it'll try to extend the memory mapping for it and if that 
works it won't need to move it.

Currently, it does that by using mquery(2) to see if that area is 
available and if so then trying to mmap it, munmaping it if mmap(2) didn't 
return the desired address (perhaps due to a race with another thread).  
Instead of doing mquery/mmap/munmap, this path can use the MAP_FIXED and 
__MAP_NOREPLACE flags to directly request the specific address but not 
change anything if it's not available.


(We still use mquery in ld.so on i386 as a performance optimization, but 
that use case differs in needing to find a base address for *multiple* 
mappings, where unrolling partial hits grows to be more expensive than 
probing with mquery and only trying to do all the mapping on a successful 
set of probes: recall that on x86 munmap() is more expensive than mmap() 
due to TLB shootdowns. This case in realloc() only has a single mapping to 
probe/extend so it avoids those costs.  Indeed, this diff eliminates the 
current "munmap on failed attempt" path/cost.)


The regress/lib/libc/malloc tests continue to pass on my amd64 box, with 
ktrace confirming the change in calls.

oks?


Philip


Index: stdlib/malloc.c
===
RCS file: /data/src/openbsd/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.273
diff -u -p -r1.273 malloc.c
--- stdlib/malloc.c 26 Feb 2022 16:14:42 -  1.273
+++ stdlib/malloc.c 14 May 2022 02:35:04 -
@@ -100,9 +100,6 @@
 #define MMAPA(a,sz,f)  mmap((a), (sz), PROT_READ | PROT_WRITE, \
 MAP_ANON | MAP_PRIVATE | (f), -1, 0)
 
-#define MQUERY(a,sz,f) mquery((a), (sz), PROT_READ | PROT_WRITE, \
-MAP_ANON | MAP_PRIVATE | MAP_FIXED | (f), -1, 0)
-
 struct region_info {
void *p;/* page; low bits used to mark chunks */
uintptr_t size; /* size for pages, or chunk_info pointer */
@@ -1687,11 +1684,7 @@ orealloc(struct dir_info **argpool, void
size_t needed = rnewsz - roldsz;
 
STATS_INC(pool->cheap_realloc_tries);
-   q = MQUERY(hint, needed, pool->mmap_flag);
-   if (q == hint)
-   q = MMAPA(hint, needed, 
pool->mmap_flag);
-   else
-   q = MAP_FAILED;
+   q = MMAPA(hint, needed, MAP_FIXED | 
__MAP_NOREPLACE | pool->mmap_flag);
if (q == hint) {
STATS_ADD(pool->malloc_used, needed);
if (pool->malloc_junk == 2)
@@ -1709,9 +1702,6 @@ orealloc(struct dir_info **argpool, void
STATS_INC(pool->cheap_reallocs);
ret = p;
goto done;
-   } else if (q != MAP_FAILED) {
-   if (munmap(q, needed))
-   wrterror(pool, "munmap %p", q);
}
}
} else if (rnewsz < roldsz) {