On Tue, Jan 26, 2021 at 04:08:40PM +0000, Marek Klein wrote:

> Hi,
> 
> We are working on an appliance like product that is based on OpenBSD.
> Recently we found out that our performance critical C++ program is
> ~2.5 times slower on OpenBSD compared to Ubuntu 20.04.
> 
> The program basically just reads data from stdin, does some
> transformation of the data, and returns the result on stdout, thus
> the program does not perform any further I/O operations nor interacts
> with other programs. We extensively use the C++ standard library string
> class for manipulation of data.
> 
> We started searching for the reason, and eliminated I/O as a factor.
> During some experiments we found out that one, perhaps not the only
> one, factor is OpenBSD's memory management. To test this assumption we
> wrote a simple program that allocates and frees memory in a loop.
> Something like:
> 
> for (...) {
>   void *buffer = malloc(...);
>   ...
>   free(buffer);
> }
> 
> We compiled it on OpenBSD with clang
> $ /usr/bin/c++ --version
> OpenBSD clang version 10.0.1
> Target: amd64-unknown-openbsd6.8
> Thread model: posix
> InstalledDir: /usr/bin
> 
> using options '-O3 -DNDEBUG -std=gnu++11' and ran it without memory
> junking.
> 
> $ time MALLOC_OPTIONS=jj ./memory_allocs --cycles 123456789 --size 1024
> 
> real  0m27.218s
> user  0m27.220s
> sys   0m0.020s
> 
> We compiled the same program on Ubuntu 20.04 with g++
> $ /usr/bin/c++ --version
> c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
> 
> using the same options '-O3 -DNDEBUG -std=gnu++11'
> 
> $ time ./memory_allocs --cycles 123456789 --size 1024
> 
> real  0m1,920s
> user  0m1,915s
> sys   0m0,004s
> 
> Both systems were tested in the same virtualized environment (VSphere),
> thus we can assume the "hardware" is the same.
> 
> Given the virtual environment, the tests might not be scientifically
> the best choice, but they serve the observation well enough. We
> actually ruled out virtualization as a cause in other tests.

Short story: the slowness is because you get more security.

Somewhat longer story: depending on the size if the allocation actual
unmaps take place on free. This will catch use-after-free always. For
smaller allocations, caching takes place, sadly you did not tell us
how big the total of your allocations are. So I cannot predict if
enlargering the cache will help you.

Now the differnence is quite big so I like to know what you are doing
exactly in your test program.  Please provide the full test porogram
so I can take a look.

> 
> What other options are there we could try in order to speed the memory
> management up?

Some hintss: allocate/free less, use better algorithms that do not
allocate as much.  With C++ make sure your code uses moves of objects
instead of copies whenever possible. Use reserve() wisely. If all else
fails you might go for custom allocaters, but you will loose security
features.

        -Otto

> 
> Also are there any other known areas, for CPU bound processing, where
> OpenBSD performs worse than other "common" platforms? 
> 
> Cheers,
> Marek
> 

Reply via email to