git: 35232d8edb5b - main - papers: Retire phkmalloc paper

Ed Maste Wed, 26 Nov 2025 16:22:04 -0800

The branch main has been updated by emaste:

URL: 
https://cgit.FreeBSD.org/src/commit/?id=35232d8edb5b8b3f18b6d9b2a93ecc252f8ed5a2


commit 35232d8edb5b8b3f18b6d9b2a93ecc252f8ed5a2
Author:     Ed Maste <[email protected]>
AuthorDate: 2025-06-17 15:07:42 +0000
Commit:     Ed Maste <[email protected]>
CommitDate: 2025-11-27 00:21:08 +0000

    papers: Retire phkmalloc paper
    
    It has not been installed since commit cdc37953165c ("In preparation for
    the removal of the roff toolchain, disconnect the") and turned up in
    a search for outdated MALLOC_OPTIONS settings.
    
    The rendered paper is available at
    https://papers.freebsd.org/1998/phk-malloc
    
    PR:             287357
    Reviewed by:    bapt
    Event:          Kitchener-Waterloo Hackathon 202506
    Sponsored by:   The FreeBSD Foundation
    Differential Revision: https://reviews.freebsd.org/D50908
---
 share/doc/papers/Makefile                 |   1 -
 share/doc/papers/malloc/Makefile          |   7 -
 share/doc/papers/malloc/abs.ms            |  33 -----
 share/doc/papers/malloc/alternatives.ms   |  43 ------
 share/doc/papers/malloc/conclusion.ms     |  46 ------
 share/doc/papers/malloc/implementation.ms | 223 ------------------------------
 share/doc/papers/malloc/intro.ms          |  72 ----------
 share/doc/papers/malloc/kernel.ms         |  54 --------
 share/doc/papers/malloc/malloc.ms         |  70 ----------
 share/doc/papers/malloc/performance.ms    | 111 ---------------
 share/doc/papers/malloc/problems.ms       |  52 -------
 11 files changed, 712 deletions(-)

diff --git a/share/doc/papers/Makefile b/share/doc/papers/Makefile
index 3c702b2aeaaa..0ab8a40d2cb7 100644
--- a/share/doc/papers/Makefile
+++ b/share/doc/papers/Makefile
@@ -7,7 +7,6 @@ SUBDIR= beyond4.3 \
        jail \
        kernmalloc \
        kerntune \
-       malloc \
        newvm \
        relengr \
        sysperf \
diff --git a/share/doc/papers/malloc/Makefile b/share/doc/papers/malloc/Makefile
deleted file mode 100644
index 755264466aa3..000000000000
--- a/share/doc/papers/malloc/Makefile
+++ /dev/null
@@ -1,7 +0,0 @@
-VOLUME=        papers
-DOC=   malloc
-SRCS=  abs.ms intro.ms kernel.ms malloc.ms problems.ms alternatives.ms \
-       performance.ms implementation.ms conclusion.ms
-MACROS=        -ms
-
-.include <bsd.doc.mk>
diff --git a/share/doc/papers/malloc/abs.ms b/share/doc/papers/malloc/abs.ms
deleted file mode 100644
index ef70cec772af..000000000000
--- a/share/doc/papers/malloc/abs.ms
+++ /dev/null
@@ -1,33 +0,0 @@
-.\"
-.\" 
----------------------------------------------------------------------------
-.\" "THE BEER-WARE LICENSE" (Revision 42):
-.\" <[email protected]> wrote this file.  As long as you retain this notice you
-.\" can do whatever you want with this stuff. If we meet some day, and you 
think
-.\" this stuff is worth it, you can buy me a beer in return.   Poul-Henning 
Kamp
-.\" 
----------------------------------------------------------------------------
-.\"
-.if n .ND
-.TL
-Malloc(3) in modern Virtual Memory environments.
-.sp
-Revised 
-Fri Apr  5 12:50:07  1996
-.AU
-Poul-Henning Kamp
-.AI
-<[email protected]>
-Den Andensidste Viking
-Valbygaardsvej 8
-DK-4200 Slagelse
-Denmark
-.AB
-Malloc/free is one of the oldest parts of the C language environment
-and obviously the world has changed a bit since it was first made.
-The fact that most UNIX kernels have changed from swap/segment to
-virtual memory/page based memory management has not been sufficiently
-reflected in the implementations of the malloc/free API.
-.PP
-A new implementation was designed, written, tested and bench-marked
-with an eye on the workings and performance characteristics of modern
-Virtual Memory systems.
-.AE
diff --git a/share/doc/papers/malloc/alternatives.ms 
b/share/doc/papers/malloc/alternatives.ms
deleted file mode 100644
index 02efa1f88901..000000000000
--- a/share/doc/papers/malloc/alternatives.ms
+++ /dev/null
@@ -1,43 +0,0 @@
-.\"
-.\" 
----------------------------------------------------------------------------
-.\" "THE BEER-WARE LICENSE" (Revision 42):
-.\" <[email protected]> wrote this file.  As long as you retain this notice you
-.\" can do whatever you want with this stuff. If we meet some day, and you 
think
-.\" this stuff is worth it, you can buy me a beer in return.   Poul-Henning 
Kamp
-.\" 
----------------------------------------------------------------------------
-.\"
-.ds RH Alternative implementations
-.NH
-Alternative implementations
-.PP
-These problems were actually the inspiration for the first alternative
-malloc implementations.
-Since their main aim was debugging, they would often use techniques
-like allocating a guard zone before and after the chunk,
-and possibly filling these guard zones
-with some pattern, so accesses outside the allocated chunk could be detected
-with some decent probability.
-Another widely used technique is to use tables to keep track of which
-chunks are actually in which state and so on.
-.PP
-This class of debugging has been taken to its practical extreme by
-the product "Purify" which does the entire memory-coloring exercise
-and not only keeps track of what is in use and what isn't, but also
-detects if the first reference is a read (which would return undefined
-values) and other such violations.
-.PP
-Later actual complete implementations of malloc arrived, but many of
-these still based their workings on the basic schema mentioned previously,
-disregarding that in the meantime virtual memory and paging have
-become the standard environment.
-.PP
-The most widely used "alternative" malloc is undoubtedly ``gnumalloc''
-which has received wide acclaim and certainly runs faster than
-most stock mallocs.  It does, however, tend to fare badly in
-cases where paging is the norm rather than the exception.
-.PP
-The particular malloc that prompted this work basically didn't bother 
-reusing storage until the kernel forced it to do so by refusing 
-further allocations with sbrk(2).
-That may make sense if you work alone on your own personal mainframe,
-but as a general policy it is less than optimal.
diff --git a/share/doc/papers/malloc/conclusion.ms 
b/share/doc/papers/malloc/conclusion.ms
deleted file mode 100644
index 9d0d1f2a83a6..000000000000
--- a/share/doc/papers/malloc/conclusion.ms
+++ /dev/null
@@ -1,46 +0,0 @@
-.\"
-.\" 
----------------------------------------------------------------------------
-.\" "THE BEER-WARE LICENSE" (Revision 42):
-.\" <[email protected]> wrote this file.  As long as you retain this notice you
-.\" can do whatever you want with this stuff. If we meet some day, and you 
think
-.\" this stuff is worth it, you can buy me a beer in return.   Poul-Henning 
Kamp
-.\" 
----------------------------------------------------------------------------
-.\"
-.ds RH Conclusion and experience.
-.NH
-Conclusion and experience.
-.PP
-In general the performance differences between gnumalloc and this
-malloc are not that big.
-The major difference comes when primary storage is seriously 
-over-committed, in which case gnumalloc
-wastes time paging in pages it's not going to use.
-In such cases as much as a factor of five in wall-clock time has 
-been seen in difference.
-Apart from that gnumalloc and this implementation are pretty
-much head-on performance-wise.
-.PP
-Several legacy programs in the BSD 4.4 Lite distribution had
-code that depended on the memory returned from malloc
-being zeroed.  In a couple of cases, free(3) was called more than
-once for the same allocation, and a few cases even called free(3)
-with pointers to objects in the data section or on the stack.
-.PP
-A couple of users have reported that using this malloc on other
-platforms yielded "pretty impressive results", but no hard benchmarks
-have been made.
-.ds RH Acknowledgements & references.
-.NH
-Acknowledgements & references.
-.PP
-The first implementation of this algorithm was actually a file system,
-done in assembler using 5-hole ``Baudot'' paper tape for a drum storage
-device attached to a 20 bit germanium transistor computer with 2000 words
-of memory, but that was many years ago.
-.PP
-Peter Wemm <[email protected]> came up with the idea to store the
-page-directory in mmap(2)'ed memory instead of in the heap.
-This has proven to be a good move.
-.PP
-Lars Fredriksen <[email protected]> found and identified a
-fence-post bug in the code.
diff --git a/share/doc/papers/malloc/implementation.ms 
b/share/doc/papers/malloc/implementation.ms
deleted file mode 100644
index f9b547be18f5..000000000000
--- a/share/doc/papers/malloc/implementation.ms
+++ /dev/null
@@ -1,223 +0,0 @@
-.\"
-.\" 
----------------------------------------------------------------------------
-.\" "THE BEER-WARE LICENSE" (Revision 42):
-.\" <[email protected]> wrote this file.  As long as you retain this notice you
-.\" can do whatever you want with this stuff. If we meet some day, and you 
think
-.\" this stuff is worth it, you can buy me a beer in return.   Poul-Henning 
Kamp
-.\" 
----------------------------------------------------------------------------
-.\"
-.ds RH Implementation
-.NH
-Implementation
-.PP
-A new malloc(3) implementation was written to meet the goals,
-and to the extent possible to address the shortcomings listed previously.
-.PP
-The source is 1218 lines of C code, and can be found in FreeBSD 2.2
-(and probably later versions as well) as src/lib/libc/stdlib/malloc.c.
-.PP
-The main data structure is the
-.I page-directory
-which contains a
-.B void*
-for each page we have control over.
-The value can be one of:
-.IP
-.B MALLOC_NOT_MINE
-Another part of the code may call brk(2) to get a piece of the cake.
-Consequently, we cannot rely on the memory we get from the kernel
-being one consecutive piece of memory, and therefore we need a way to
-mark such pages as "untouchable".
-.IP
-.B MALLOC_FREE
-This is a free page.
-.IP
-.B MALLOC_FIRST
-This is the first page in a (multi-)page allocation.
-.IP
-.B MALLOC_FOLLOW
-This is a subsequent page in a multi-page allocation.
-.IP
-.B
-struct pginfo*
-.R
-A pointer to a structure describing a partitioned page.
-.PP
-In addition, there exists a linked list of small data structures that
-describe the free space as runs of free pages.
-.PP
-Notice that these structures are not part of the free pages themselves,
-but rather allocated with malloc so that the free pages themselves
-are never referenced while they are free.
-.PP
-When a request for storage comes in, it will be treated as a ``page''
-allocation if it is bigger than half a page.
-The free list will be searched and the first run of free pages that
-can satisfy the request is used.  The first page gets set to
-.B MALLOC_FIRST
-status.  If more than that one page is needed, the rest of them get
-.B MALLOC_FOLLOW
-status in the page-directory.
-.PP
-If there were no pages on the free list, brk(2) will be called, and
-the pages will get added to the page-directory with status
-.B MALLOC_FREE
-and the search restarts.
-.PP
-Freeing a number of pages is done by changing their state in the 
-page directory to MALLOC_FREE, and then traversing the free-pages list to
-find the right place for this run of pages, possibly collapsing
-with the two neighboring runs into one run and, if possible,
-releasing some memory back to the kernel by calling brk(2).
-.PP
-If the request is less than or equal to half of a page, its size will be
-rounded up to the nearest power of two before being processed
-and if the request is less than some minimum size, it is rounded up to
-that size.
-.PP
-These sub-page allocations are served from pages which are split up
-into some number of equal size chunks.
-For each of these pages a
-.B
-struct pginfo
-.R
-describes the size of the chunks on this page, how many there are,
-how many are free and so on.
-The description consist of a bitmap of used chunks, and various counters
-and numbers used to keep track of the stuff in the page.
-.PP
-For each size of sub-page allocation, the pginfo structures for the
-pages that have free chunks in them form a list.
-The heads of these lists are stored in predetermined slots at
-the beginning of the page directory to make access fast.
-.PP
-To allocate a chunk of some size, the head of the list for the
-corresponding size is examined, and a free chunk found.  The number
-of free chunks on that page is decreased by one and, if zero, the
-pginfo structure is unlinked from the list.
-.PP
-To free a chunk, the page is derived from the pointer, the page table
-for that page contains a pointer to the pginfo structure, where the
-free bit is set for the chunk, the number of free chunks increased by
-one, and if equal to one, the pginfo structure is linked into the
-proper place on the list for this size of chunks.
-If the count increases to match the number of chunks on the page, the
-pginfo structure is unlinked from the list and free(3)'ed and the 
-actual page itself is free(3)'ed too.
-.PP
-To be 100% correct performance-wise these lists should be ordered
-according to the recent number of accesses to that page.  This 
-information is not available and it would essentially mean a reordering
-of the list on every memory reference to keep it up-to-date.
-Instead they are ordered according to the address of the pages.
-Interestingly enough, in practice this comes out to almost the same 
-thing performance-wise.
-.PP
-It's not that surprising after all, it's the difference between
-following the crowd or actively directing where it can go, in both
-ways you can end up in the middle of it all.
-.PP
-The side effect of this compromise is that it also uses less storage,
-and the list never has to be reordered, all the ordering happens when
-pages are added or deleted.
-.PP
-It is an interesting twist to the implementation that the
-.B
-struct pginfo
-.R
-is allocated with malloc.
-That is, "as with malloc" to be painfully correct.
-The code knows the special case where the first (couple) of allocations on
-the page is actually the pginfo structure and deals with it accordingly.
-This avoids some silly "chicken and egg" issues.
-.ds RH Bells and whistles.
-.NH
-Bells and whistles.
-.PP
-brk(2) is actually not a very fast system call when you ask for storage.
-This is mainly because of the need by the kernel to zero the pages before
-handing them over, so therefore this implementation does not release 
-heap pages until there is a large chunk to release back to the kernel.
-Chances are pretty good that we will need it again pretty soon anyway.
-Since these pages are not accessed at all, they will soon be paged out
-and don't affect anything but swap-space usage.
-.PP
-The page directory is actually kept in a mmap(2)'ed piece of
-anonymous memory.  This avoids some rather silly cases that
-would otherwise have to be handled when the page directory
-has to be extended.
-.PP
-One particularly nice feature is that all pointers passed to free(3)
-and realloc(3) can be checked conclusively for validity:
-First the pointer is masked to find the page.  The page directory
-is then examined, it must contain either MALLOC_FIRST, in which
-case the pointer must point exactly at the page, or it can contain
-a struct pginfo*, in which case the pointer must point to one of
-the chunks described by that structure.
-Warnings will be printed on
-.B stderr
-and nothing will be done with
-the pointer if it is found to be invalid.
-.PP
-An environment variable
-.B MALLOC_OPTIONS
-allows the user some control over the behavior of malloc.
-Some of the more interesting options are:
-.IP
-.B Abort
-If malloc fails to allocate storage, core-dump the process with
-a message rather than expect it handle this correctly.
-It's amazing how few programs actually handle this condition correctly,
-and consequently the havoc they can create is the more creative or
-destructive.
-.IP
-.B Dump
-Writes malloc statistics to a file called ``malloc.out'' prior
-to process termination.
-.IP
-.B Hint
-Pass a hint to the kernel about pages we no longer need through the
-madvise(2) system call.  This can help performance on machines that
-page heavily by eliminating unnecessary page-ins and page-outs of
-unused data.
-.IP
-.B Realloc
-Always do a free and malloc when realloc(3) is called.
-For programs doing garbage collection using realloc(3), this makes the
-heap collapse faster since malloc will reallocate from the 
-lowest available address.
-The default
-is to leave things alone if the size of the allocation is still in
-the same size-class.
-.IP
-.B Junk
-will explicitly fill the allocated area with a particular value
-to try to detect if programs rely on it being zero.
-.IP
-.B Zero
-will explicitly zero out the allocated chunk of memory, while any
-space after the allocation in the chunk will be filled with the
-junk value to try to catch out of the chunk references.
-.ds RH The road not taken.
-.NH
-The road not yet taken.
-.PP
-A couple of avenues were explored that could be interesting in some
-set of circumstances.
-.PP
-Using mmap(2) instead of brk(2) was actually slower, since brk(2)
-knows a lot of the things that mmap has to find out first.
-.PP
-In general there is little room for further improvement of the
-time-overhead of the malloc, further improvements will have to
-be in the area of improving paging behavior.
-.PP
-It is still under consideration to add a feature such that
-if realloc is called with two zero arguments, the internal
-allocations will be reallocated to perform a garbage collect.
-This could be used in certain types of programs to collapse
-the memory use, but so far it doesn't seem to be worth the effort.
-.PP
-Malloc/Free can be a significant point of contention in multi-threaded
-programs.  Low-grain locking of the data-structures inside the 
-implementation should be implemented to avoid excessive spin-waiting.
diff --git a/share/doc/papers/malloc/intro.ms b/share/doc/papers/malloc/intro.ms
deleted file mode 100644
index 59064ee166f1..000000000000
--- a/share/doc/papers/malloc/intro.ms
+++ /dev/null
@@ -1,72 +0,0 @@
-.\"
-.\" 
----------------------------------------------------------------------------
-.\" "THE BEER-WARE LICENSE" (Revision 42):
-.\" <[email protected]> wrote this file.  As long as you retain this notice you
-.\" can do whatever you want with this stuff. If we meet some day, and you 
think
-.\" this stuff is worth it, you can buy me a beer in return.   Poul-Henning 
Kamp
-.\" 
----------------------------------------------------------------------------
-.\"
-.ds RH Introduction
-.NH
-Introduction
-.PP
-Most programs need to allocate storage dynamically in addition
-to whatever static storage the compiler reserved at compile-time.
-To C programmers this fact is rather obvious, but for many years
-this was not an accepted and recognized fact, and many languages 
-still used today don't support this notion adequately.
-.PP
-The classic UNIX kernel provides two very simple and powerful
-mechanisms for obtaining dynamic storage, the execution stack 
-and the heap.
-The stack is usually put at the far upper end of the address-space,
-from where it grows down as far as needed, though this may depend on
-the CPU design.
-The heap starts at the end of the
-.B bss
-segment and grows upwards as needed.
-.PP
-There isn't really a kernel-interface to the stack as such.
-The kernel will allocate some amount of memory for it,
-not even telling the process the exact size.
-If the process needs more space than that, it will simply try to access
-it, hoping that the kernel will detect that an access has been 
-attempted outside the allocated memory, and try to extend it.
-If the kernel fails to extend the stack, this could be because of lack
-of resources or permissions or because it may just be impossible
-to do in the first place, the process will usually be shot down by the 
-kernel.
-.PP
-In the C language, there exists a little used interface to the stack,
-.B alloca(3) ,
-which will explicitly allocate space on the stack.
-This is not an interface to the kernel, but merely an adjustment
-done to the stack-pointer such that space will be available and
-unharmed by any subroutine calls yet to be made while the context
-of the current subroutine is intact.
-.PP
-Due to the nature of normal use of the stack, there is no corresponding
-"free" operator, but instead the space is returned when the current
-function returns to its caller and the stack frame is dismantled.
-This is the cause of much grief, and probably the single most important
-reason that alloca(3) is not, and should not be, used widely.
-.PP
-The heap on the other hand has an explicit kernel-interface in the 
-system call
-.B brk(2) .
-The argument to brk(2) is a pointer to where the process wants the
-heap to end.
-There is also an interface called
-.B sbrk(2)
-taking an increment to the current end of the heap, but this is merely a
-.B libc
-front for brk(2).
-.PP
-In addition to these two memory resources, modern virtual memory kernels
-provide the mmap(2)/munmap(2) interface which allows almost complete
-control over any bit of virtual memory in the process address space.
-.PP
-Because of the generality of the mmap(2) interface and the way the 
-data structures representing the regions are laid out, sbrk(2) is actually
-faster in use than the equivalent mmap(2) call, simply because
-mmap(2) has to search for information that is implicit in the sbrk(2) call.
diff --git a/share/doc/papers/malloc/kernel.ms 
b/share/doc/papers/malloc/kernel.ms
deleted file mode 100644
index 3672065ddef4..000000000000
--- a/share/doc/papers/malloc/kernel.ms
+++ /dev/null
@@ -1,54 +0,0 @@
-.\"
-.\" 
----------------------------------------------------------------------------
-.\" "THE BEER-WARE LICENSE" (Revision 42):
-.\" <[email protected]> wrote this file.  As long as you retain this notice you
-.\" can do whatever you want with this stuff. If we meet some day, and you 
think
-.\" this stuff is worth it, you can buy me a beer in return.   Poul-Henning 
Kamp
-.\" 
----------------------------------------------------------------------------
-.\"
-.ds RH The kernel and memory
-.NH
-The kernel and memory
-.PP
-Brk(2) isn't a particularly convenient interface,
-it was probably made more to fit the memory model of the 
-hardware being used, than to fill the needs of the programmers.
-.PP
-Before paged and/or virtual memory systems became
-common, the most popular memory management facility used for
-UNIX was segments.
-This was also very often the only vehicle for imposing protection on 
-various parts of memory.
-Depending on the hardware, segments can be anything, and consequently 
-how the kernels exploited them varied a lot from UNIX to UNIX and from
-machine to machine.
-.PP
-Typically a process would have one segment for the text section, one
-for the data and bss section combined and one for the stack.
-On some systems the text shared a segment with the data and bss, and was
-consequently just as writable as them.
-.PP
-In this setup all the brk(2) system call has to do is to find the
-right amount of free storage, possibly moving things around in physical
-memory, maybe even swapping out a segment or two to make space,
-and change the upper limit on the data segment according to the address given.
-.PP
-In a more modern page based virtual memory implementation this is still
-pretty much the situation, except that the granularity is now pages:
-The kernel finds the right number of free pages, possibly paging some
-pages out to free them up, and then plugs them into the page-table of 
-the process.
-.PP
-As such the difference is very small, the real difference is that in
-the old world of swapping, either the entire process was in primary
-storage or it wouldn't be selected to be run.  In a modern VM kernel,
-a process might only have a subset of its pages in primary memory,
-the rest will be paged in, if and when the process tries to access them.
-.PP
-Only very few programs deal with the brk(2) interface directly.
-The few that do usually have their own memory management facilities.
-LISP or FORTH interpreters are good examples.
-Most other programs use the
-.B malloc(3) 
-interface instead, and leave it to the malloc implementation to 
-use brk(2) to get storage allocated from the kernel.
diff --git a/share/doc/papers/malloc/malloc.ms 
b/share/doc/papers/malloc/malloc.ms
deleted file mode 100644
index 79e5173226b2..000000000000
--- a/share/doc/papers/malloc/malloc.ms
+++ /dev/null
@@ -1,70 +0,0 @@
-.\"
-.\" 
----------------------------------------------------------------------------
-.\" "THE BEER-WARE LICENSE" (Revision 42):
-.\" <[email protected]> wrote this file.  As long as you retain this notice you
-.\" can do whatever you want with this stuff. If we meet some day, and you 
think
-.\" this stuff is worth it, you can buy me a beer in return.   Poul-Henning 
Kamp
-.\" 
----------------------------------------------------------------------------
-.\"
-.ds RH Malloc and free
-.NH
-Malloc and free
-.PP
-The job of malloc(3) is to turn the rather simple
-brk(2) facility into a service programs can
-actually use without getting hurt.
-.PP
-The archetypical malloc(3) implementation keeps track of the memory between
-the end of the bss section, as defined by the 
-.B _end 
-symbol, and the current brk(2) point using a linked list of chunks of memory.
-Each item on the list has a status as either free or used, a pointer
-to the next entry and in most cases to the previous as well, to speed
-up inserts and deletes in the list.
-.PP
-When a malloc(3) request comes in, the list is traversed from the
-front and if a free chunk big enough to hold the request is found,
-it is returned, if the free chunk is bigger than the size requested,
-a new free chunk is made from the excess and put back on the list.
-.PP
-When a chunk is 
-.B free(3) 'ed,
-the chunk is found in the list, its status
-is changed to free and if one or both of the surrounding chunks
-are free, they are collapsed to one.
-.PP
-A third kind of request, 
-.B realloc(3) ,
-will resize
-a chunk, trying to avoid copying the contents if possible.
-It is seldom used, and has only had a significant impact on performance
-in a few special situations.
-The typical pattern of use is to malloc(3) a chunk of the maximum size 
-needed, read in the data and adjust the size of the chunk to match the
-size of the data read using realloc(3).
-.PP
-For reasons of efficiency, the original implementation of malloc(3)
-put the small structure used to contain the next and previous pointers
-plus the state of the chunk right before the chunk itself.
-.PP
-As a matter of fact, the canonical malloc(3) implementation can be 
-studied in the ``Old testament'', chapter 8 verse 7 [Kernighan & Ritchie]
-.PP
-Various optimisations can be applied to the above basic algorithm:
-.IP
-If in freeing a chunk, we end up with the last chunk on the list being
-free, we can return that to the kernel by calling brk(2) with the first
-address of that chunk and then make the previous chunk the last on the
-chain by terminating its ``next'' pointer.
-.IP
-A best-fit algorithm can be used instead of first-fit at an expense 
-of memory, because statistically fewer chances to brk(2) backwards will 
-present themselves.
-.IP
-Splitting the list in two, one for used and one for free chunks, to
-speed the searching.
-.IP
-Putting free chunks on one of several free lists, depending on their size,
-to speed allocation.
-.IP
-\&...
diff --git a/share/doc/papers/malloc/performance.ms 
b/share/doc/papers/malloc/performance.ms
deleted file mode 100644
index 49e9d6b75517..000000000000
--- a/share/doc/papers/malloc/performance.ms
+++ /dev/null
@@ -1,111 +0,0 @@
-.\"
-.\" 
----------------------------------------------------------------------------
-.\" "THE BEER-WARE LICENSE" (Revision 42):
-.\" <[email protected]> wrote this file.  As long as you retain this notice you
-.\" can do whatever you want with this stuff. If we meet some day, and you 
think
-.\" this stuff is worth it, you can buy me a beer in return.   Poul-Henning 
Kamp
-.\" 
----------------------------------------------------------------------------
-.\"
-.ds RH Performance
-.NH
-Performance
-.PP
-Performance for a malloc(3) implementation comes as two variables:
-.IP
-A: How much time does it use for searching and manipulating data structures.
-We will refer to this as ``overhead time''.
-.IP
-B: How well does it manage the storage.
-This rather vague metric we call ``quality of allocation''.
-.PP
-The overhead time is easy to measure, just do a lot of malloc/free calls
-of various kinds and combination, and compare the results.
-.PP
-The quality of allocation is not quite as simple as that.
-One measure of quality is the size of the process, that should obviously
-be minimized.
-Another measure is the execution time of the process.
-This is not an obvious indicator of quality, but people will generally
-agree that it should be minimized as well, and if malloc(3) can do
-anything to do so, it should.
-Explanation why it is still a good metric follows:
-.PP
-In a traditional segment/swap kernel, the desirable behavior of a process
-is to keep the brk(2) as low as possible, thus minimizing the size of the
-data/bss/heap segment, which in turn translates to a smaller process and
-a smaller probability of the process being swapped out, qed: faster
-execution time as an average.
-.PP
-In a paging environment this is not a bad choice for a default, but
-a couple of details needs to be looked at much more carefully.
-.PP
-First of all, the size of a process becomes a more vague concept since
-only the pages that are actually used need to be in primary storage
-for execution to progress, and they only need to be there when used.
-That implies that many more processes can fit in the same amount of
-primary storage, since most processes have a high degree of locality
-of reference and thus only need some fraction of their pages to actually
-do their job.
-.PP
-From this it follows that the interesting size of the process, is some
-subset of the total amount of virtual memory occupied by the process.
-This number isn't a constant, it varies depending on the whereabouts
-of the process, and it may indeed fluctuate wildly over the lifetime
-of the process.
-.PP
-One of the names for this vague concept is ``current working set''.
-It has been defined many different ways over the years, mostly to
-satisfy and support claims in marketing or benchmark contexts.
-.PP
-For now we can simply say that it is the number of pages the process
-needs in order to run at a sufficiently low paging rate in a congested
-primary storage.
-(If primary storage isn't congested, this is not really important 
-of course, but most systems would be better off using the pages for
-disk-cache or similar functions, so from that perspective it will
-always be congested.)
-If the number of pages is too small, the process will wait for its
-pages to be read from secondary storage much of the time, if it's too
-big, the space could be used better for something else.
-.PP
-From the view of any single process, this number of pages is 
-"all of my pages", but from the point of view of the OS it should
-be tuned to maximise the total throughput of all the processes on
-the machine at the time.
-This is usually done using various kinds of least-recently-used 
-replacement algorithms to select page candidates for replacement.
-.PP
-With this knowledge, can we decide what the performance goal is for
-a modern malloc(3) ?
-Well, it's almost as simple as it used to be:
-.B
-Minimize the number of pages accessed.
-.R
-.PP
-This really is the core of it all.
-If the number of accessed pages is smaller, then locality of reference is
-higher, and all kinds of caches (which is essentially what the
-primary storage is in a VM system) work better.
-.PP
-It's interesting to notice that the classical malloc fails on this one
-because the information about free chunks is kept with the free
-chunks themselves.  In some of the benchmarks this came out as all the
-pages being paged in every time a malloc call was made, because malloc
-had to traverse the free list to find a suitable chunk for the allocation.
-If memory is not in use, then you shouldn't access it.
-.PP
-The secondary goal is more evident:
-.B
-Try to work in pages.
-.R
-.PP
-That makes it easier for the kernel, and wastes less virtual memory.
-Most modern implementations do this when they interact with the
-kernel, but few try to avoid objects spanning pages.
-.PP
-If an object's size
-is less than or equal to a page, there is no reason for it to span two pages.
-Having objects span pages means that two pages must be
-paged in, if that object is accessed.
-.PP
-With this analysis in the luggage, we can start coding.
diff --git a/share/doc/papers/malloc/problems.ms 
b/share/doc/papers/malloc/problems.ms
deleted file mode 100644
index fa27effce999..000000000000
--- a/share/doc/papers/malloc/problems.ms
+++ /dev/null
@@ -1,52 +0,0 @@
-.\"
-.\" 
----------------------------------------------------------------------------
-.\" "THE BEER-WARE LICENSE" (Revision 42):
-.\" <[email protected]> wrote this file.  As long as you retain this notice you
-.\" can do whatever you want with this stuff. If we meet some day, and you 
think
-.\" this stuff is worth it, you can buy me a beer in return.   Poul-Henning 
Kamp
-.\" 
----------------------------------------------------------------------------
-.\"
-.ds RH The problems
-.NH
-The problems
-.PP
-Even though malloc(3) is a lot simpler to use
-than the raw brk(2)/sbrk(2) interface,
-or maybe exactly because
-of that,
-a lot of problems arise from its use.
-.IP
-Writing to memory outside the allocated chunk.
-The most likely result being that the data structure used to hold
-the links and flags about this chunk or the next one gets thrashed.
-.IP
-Freeing a pointer to memory not allocated by malloc.
-This is often a pointer that points to an object on the stack or in the
-data-section, in newer implementations of C it may even be in the text-
-section where it is likely to be readonly.
-Some malloc implementations detect this, some don't.
-.IP
-Freeing a modified pointer.  This is a very common mistake, freeing
-not the pointer malloc(3) returned, but rather some offset from it.
-Some mallocs will handle this correctly if the offset is positive.
-.IP
-Freeing the same pointer more than once.
-.IP
-Accessing memory in a chunk after it has been free(3)'ed.
-.PP
-The handling of these problems have traditionally been weak.
-A core-dump was the most common form for "handling", but in rare
-cases one could experience the famous "malloc: corrupt arena." 
-message before the core-dump.
-Even worse though, very often the program will just continue,
-possibly giving wrong results.
-.PP
-An entirely different form of problem is that
-the memory returned by malloc(3) can contain any value.
-Unfortunately most kernels, correctly, zero out the storage they 
-provide with brk(2), and thus the storage malloc returns will be zeroed 
-in many cases as well, so programmers are not particular apt to notice 
-that their code depends on malloc'ed storage being zeroed.
-.PP
-With problems this big and error handling this weak, it is not
-surprising that problems are hard and time consuming to find and fix.

git: 35232d8edb5b - main - papers: Retire phkmalloc paper

Reply via email to