> On Wed, Jan 27, 2021 at 08:39:46AM +0100, Otto Moerbeek wrote:
>
> > On Tue, Jan 26, 2021 at 04:08:40PM +0000, Marek Klein wrote:
> >
> > > Hi,
> > >
> > > We are working on an appliance like product that is based on OpenBSD.
> > > Recently we found out that our performance critical C++ program is
> > > ~2.5 times slower on OpenBSD compared to Ubuntu 20.04.
> > >
> > > The program basically just reads data from stdin, does some
> > > transformation of the data, and returns the result on stdout, thus
> > > the program does not perform any further I/O operations nor interacts
> > > with other programs. We extensively use the C++ standard library string
> > > class for manipulation of data.
> > >
> > > We started searching for the reason, and eliminated I/O as a factor.
> > > During some experiments we found out that one, perhaps not the only
> > > one, factor is OpenBSD's memory management. To test this assumption we
> > > wrote a simple program that allocates and frees memory in a loop.
> > > Something like:
> > >
> > > for (...) {
> > > void *buffer = malloc(...);
> > > ...
> > > free(buffer);
> > > }
> > >
> > > We compiled it on OpenBSD with clang
> > > $ /usr/bin/c++ --version
> > > OpenBSD clang version 10.0.1
> > > Target: amd64-unknown-openbsd6.8
> > > Thread model: posix
> > > InstalledDir: /usr/bin
> > >
> > > using options '-O3 -DNDEBUG -std=gnu++11' and ran it without memory
> > > junking.
> > >
> > > $ time MALLOC_OPTIONS=jj ./memory_allocs --cycles 123456789 --size
> 1024
> > >
> > > real 0m27.218s
> > > user 0m27.220s
> > > sys 0m0.020s
> > >
> > > We compiled the same program on Ubuntu 20.04 with g++
> > > $ /usr/bin/c++ --version
> > > c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
> > >
> > > using the same options '-O3 -DNDEBUG -std=gnu++11'
> > >
> > > $ time ./memory_allocs --cycles 123456789 --size 1024
> > >
> > > real 0m1,920s
> > > user 0m1,915s
> > > sys 0m0,004s
> > >
> > > Both systems were tested in the same virtualized environment (VSphere),
> > > thus we can assume the "hardware" is the same.
> > >
> > > Given the virtual environment, the tests might not be scientifically
> > > the best choice, but they serve the observation well enough. We
> > > actually ruled out virtualization as a cause in other tests.
> >
> > Short story: the slowness is because you get more security.
> >
> > Somewhat longer story: depending on the size if the allocation actual
> > unmaps take place on free. This will catch use-after-free always. For
> > smaller allocations, caching takes place, sadly you did not tell us
> > how big the total of your allocations are. So I cannot predict if
> > enlargering the cache will help you.
> >
> > Now the differnence is quite big so I like to know what you are doing
> > exactly in your test program. Please provide the full test porogram
> > so I can take a look.
> >
> > >
> > > What other options are there we could try in order to speed the memory
> > > management up?
> >
> > Some hintss: allocate/free less, use better algorithms that do not
> > allocate as much. With C++ make sure your code uses moves of objects
> > instead of copies whenever possible. Use reserve() wisely. If all else
> > fails you might go for custom allocaters, but you will loose security
> > features.
> >
> > -Otto
> >
> > >
> > > Also are there any other known areas, for CPU bound processing, where
> > > OpenBSD performs worse than other "common" platforms?
> > >
> > > Cheers,
> > > Marek
> > >
> >
>
> To reply to myself.
>
> Be VERY careful when drawing conclusions from these kinds of test
> programs. To demonstrate, the loop in the test program below gets
> compiled out by some compilers with some settings.
>
> So again, please provide your test program.
>
> -Otto
>
> #include <err.h>
> #include <limits.h>
> #include <stdio.h>
> #include <stdlib.h>
>
> int
> main(int argc, char *argv[])
> {
> size_t count, sz, i;
> char *p;
> const char *errstr;
>
> count = strtonum(argv[1], 0, LONG_MAX, &errstr);
> if (errstr)
> errx(1, "%s: %s", argv[1], errstr);
> sz = strtonum(argv[2], 0, LONG_MAX, &errstr);
> if (errstr)
> errx(1, "%s: %s", argv[2], errstr);
>
> printf("Run with %zu %zu\n", count, sz);
>
> for (i = 0; i < count; i++) {
> p = malloc(sz);
> if (p == NULL)
> err(1, NULL);
> *p = 1;
> free(p);
> }
> }
>
>
Hi Otto,
My test program does something very similar.
As stated before I compile with
1. OpenBSD: clang version 10.0.1 and
2. Ubuntu: g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
with the same options '-O3 -DNDEBUG -std=gnu++11'.
The execution time grows with the number of cycles and also with
the size of allocated memory on both platforms, thus I think the loop
is not optimized out.
OpenBSD needs consistently ~10x longer to finish the test compared to
Ubuntu. Regarding the size of allocations, we operate on relatively
short strings, e.g., 25 bytes long.
Cheers,
Marek
#include <iostream>
#include <memory>
#include <sstream>
#include <stdexcept>
#include <string.h>
class dynamic_buffer {
public:
dynamic_buffer(size_t size)
: m_memory(NULL) {
m_memory = (char *)malloc(size);
if (m_memory == NULL) {
throw std::runtime_error("out of memory");
}
}
dynamic_buffer() = delete;
dynamic_buffer(const dynamic_buffer&) = delete;
dynamic_buffer(dynamic_buffer&&) noexcept = delete;
dynamic_buffer& operator=(const dynamic_buffer&) = delete;
dynamic_buffer& operator=(dynamic_buffer&&) noexcept = delete;
char* raw_memory() {
return m_memory;
}
~dynamic_buffer() {
if (m_memory != NULL) {
free(m_memory);
}
}
private:
char *m_memory;
};
static std::string help(const std::string &program_name) {
std::stringstream help;
help << program_name
<< " --cycles <number of cycles> --size <size of buffer>"
<< std::endl;
return help.str();
}
int main(int argc, const char *argv[]) {
try {
if (argc != 5) {
throw std::logic_error(help(std::string(argv[0])));
}
int number_of_cycles = atoi(argv[2]);
int size_of_buffer = atoi(argv[4]);
for (int i = 0; i < number_of_cycles; i++) {
dynamic_buffer buffer(size_of_buffer);
if (*reinterpret_cast<unsigned int*>(buffer.raw_memory()) == 0xDEADBEEF) {
std::cout << "Bingo!" << std::endl;
}
}
return 0;
} catch (const std::exception &e) {
std::cerr << e.what() << std::endl;
} catch (...) {
std::cerr << "Something went really wrong" << std::endl;
}
return 1;
}