Hi All,

With recent contributions to Hurd IRQ management I was finally able to run GNU/Hurd on my vintage X64 hardware for the purposes of stress testing using stress-ng. I've been running similar tests on virtual machines over the last 6 months or so and I was interested in how stable these tests were on standalone hardware.

It was immediately obvious that the swapping performance during the intensive paging operations was very much reduced compared to the virtual machine. That in itself is not surprising but the performance was so poor that system lockups (which do occur similarly on virtual machines) were almost immediate. In fact, I haven't been able to run this single 2 minute test case to completion without the kernel ending in a 'system lock' (awaiting a page in):

# stress-ng -t 2m --metrics --vm 32 --vm-bytes 1800M --mmap 32 --mmap-bytes 1800M --page-in

My machine has a traditional rotating disc and 4Gb of RAM. Running the above on a similar sized virtual machine uses around 1.3G of swap and does succeed approximately 90% or more of the time. My suspicion is that the longer page in times associated with a real disc (rather than a possibly cached virtual disc) results in a greater likelihood of system lock. I concluded that in order to make meaningful observations of this type of system performance on actual hardware that I needed to make some improvements to the swapping performance.

The current page replacement policy in gnumach/vm_page is documented within the source. It describes the policy of preferring page out of external pages (mmap) over internal pages (anonymous memory) in order to minimise the use of the default pager which is described as unreliable. I've been stress testing GNU/Hurd now for quite some time and seen many instances of system freezes but I do not recall any that there were certainly caused by the default pager. The most common underlying cause involves a request to an external page that cannot be progressed due to either a deadlock situation elsewhere or assertions within the ext2fs or storeio servers.

I have spent some time recently developing some alternative page replacement implementations of varying complexity. One of the most simple of these (referred to from here as 'My_patch') actually results in very significant performance improvements generally and sufficient improvement to allow the stress test case above to complete most times. Before I offer this as a patch series, I'd like to present the performance improvements it results in and a description of how it achieves it.

I've benchmarked the following 2 test cases:

1) SNG10
This is simply a tenfold iteration of the 2 minute stress-ng test case shown above. Whilst this is a good driver of the system to enter a heavy paging state, it doesn't really represent anything that might normally be run on a machine.

2) TCM3
This is a test case that might be more likely to occur normally. I found some C++ that uses heavily templated code which results in large compilation process sizes. Specifically, I used the code MatrixSine.cpp which I found as example code within the libeigen package. Running 3 concurrent compilations results in swap usage of around 500M on my 4GB test machines:

# /usr/bin/x86_64-gnu-g++-14 -I/usr/include/eigen3 -g -O2 -o matrix_sine_1 MatrixSine.cpp & # /usr/bin/x86_64-gnu-g++-14 -I/usr/include/eigen3 -g -O2 -o matrix_sine_2 MatrixSine.cpp & # /usr/bin/x86_64-gnu-g++-14 -I/usr/include/eigen3 -g -O2 -o matrix_sine_3 MatrixSine.cpp &

These are the various machine configurations used with each test case:

1) VMHURD_REL: VM using Hurd (GNU-Mach 1.8+git20250731-8 amd64)
  4096M RAM (3610M post boot), 2.8G swap

2) VMHURD_PAT: VM using Hurd (GNU-Mach 1.8+git20250731-8 amd64 + 'My_patch')
  4096M RAM (3610M post boot), 2.8G swap

3) VMLINX: VM using Debian (6.12.48+deb13-amd64)
  3920M RAM (run with maxcpus=1 and has 3610M post boot), 4G swap

4) HWHURD_REL: Advent hardware using Hurd (GNU-Mach 1.8+git20250731-8 amd64)
  4096M (Available 3374M after boot), 10G swap

5) HWHURD_PAT: Advent hardware using Hurd (GNU-Mach 1.8+git20250731-8 amd64 + 'My_patch')
  4096M (Available 3374M after boot), 10G swap

6) HWLINX: Advent hardware using Debian (6.12.48+deb13-amd64)
  4096M (run with maxcpus=1 and has 3325M available after boot)

I have a number of my own local glibc, gnumach and hurd patches that fix various bugs exposed by the stress tests but which I have not yet submitted for merging. These do not affect swap performance.

These figures show averages for a number of runs of TCM3:

VMHURD_REL/TCM3: 11m12s (pagein=2225294,pageout=1972821)
VMHURD_PAT/TCM3:  3m07s (pagein= 179883,pageout= 281279)

HWHURD_REL/TCM3:  Unable to complete any test case
HWHURD_PAT/TCM3:  8m59s (pagein=256466,pageout=373059)

HWLINX/TCM3:      2m12s (pagein=66796,pageout=236674)

The VMLINX times are significantly shorter than VMHURD_PAT but due to differences in virtual machine optimisations it doesn't seem meaningful to report those. The above however shows that on hardware and even with my patched kernel that Linux is around 4 times faster than GNU/Hurd in this test case.

The stress-ng test case metrics give an indication of the number of mmap and vm operations completed. Here are the averaged totals for a number of test cases:

VMHURD_REL/SNG10: (mmap) 50.1, (vm) 206217
VMHURD_PAT/SNG10: (mmap) 327.1, (vm) 840666
HWHURD_REL/SNG10: Unable to complete any test run
HWHURD_PAT/SNG10: (mmap) 183, (vm) 560018

TCM3 completes over 3 times faster with 'My_patch' and there are approximately 4 times as many stress-ng operations completed per iteration.

All 'My_patch' actually does is to remove the restriction in always prioritising external pages for page eviction. There are quite a few lines of code changed but almost all of them are trivial, really. The changes result in 2 main behavioural differences:

1) The vm_page code currently always attempts to find an external page before looking for internal ones. I have changed a number of functions to be told explicitly to choose either external or internal.

2) The vm_page code currently counts the number of active and inactive pages. The counts include all external and internal pages. I have changed the code to maintain separate counts for active_internal, active_external, inactive_internal and inactive_external pages.

The final patch in the series uses an extremely unsophisticated algorithm to determine what type of page to choose for evicting next. It still chooses external pages until they represent less than 1 in 25 of all (active or inactive) pages and at that point chooses internal. I do not propose this as a long term strategy but simply as a starting point for a more meaningful eviction policy. It is quite frankly ludicrously simplistic but nevertheless seemingly effective for these test cases at least.

There are many parts of the current implementation that are negatively affecting performance. I have some speculative implementations that reduce the HWHURD_PAT/TCM3 combination from the latest 9m to around 5m but they can be discussed later if appropriate.

I'd welcome feedback on whether 'My_patch' should be submitted for consideration.

Regards,

Mike.


Reply via email to