December 18, 2025 at 3:02 PM, "Michael Kelly" <[email protected] mailto:[email protected]?to=%22Michael%20Kelly%22%20%3Cmike%40weatherwax.co.uk%3E > wrote:
> > Hi All, > > With recent contributions to Hurd IRQ management I was finally able to run > GNU/Hurd on my vintage X64 hardware for the purposes of stress testing using > stress-ng. I've been running similar tests on virtual machines over the last > 6 months or so and I was interested in how stable these tests were on > standalone hardware. > > It was immediately obvious that the swapping performance during the intensive > paging operations was very much reduced compared to the virtual machine. That > in itself is not surprising but the performance was so poor that system > lockups (which do occur similarly on virtual machines) were almost immediate. > In fact, I haven't been able to run this single 2 minute test case to > completion without the kernel ending in a 'system lock' (awaiting a page in): > > # stress-ng -t 2m --metrics --vm 32 --vm-bytes 1800M --mmap 32 --mmap-bytes > 1800M --page-in > > My machine has a traditional rotating disc and 4Gb of RAM. Running the above > on a similar sized virtual machine uses around 1.3G of swap and does succeed > approximately 90% or more of the time. My suspicion is that the longer page > in times associated with a real disc (rather than a possibly cached virtual > disc) results in a greater likelihood of system lock. I concluded that in > order to make meaningful observations of this type of system performance on > actual hardware that I needed to make some improvements to the swapping > performance. > > The current page replacement policy in gnumach/vm_page is documented within > the source. It describes the policy of preferring page out of external pages > (mmap) over internal pages (anonymous memory) in order to minimise the use of > the default pager which is described as unreliable. I've been stress testing > GNU/Hurd now for quite some time and seen many instances of system freezes > but I do not recall any that there were certainly caused by the default > pager. The most common underlying cause involves a request to an external > page that cannot be progressed due to either a deadlock situation elsewhere > or assertions within the ext2fs or storeio servers. > > I have spent some time recently developing some alternative page replacement > implementations of varying complexity. One of the most simple of these > (referred to from here as 'My_patch') actually results in very significant > performance improvements generally and sufficient improvement to allow the > stress test case above to complete most times. Before I offer this as a patch > series, I'd like to present the performance improvements it results in and a > description of how it achieves it. > > I've benchmarked the following 2 test cases: > > 1) SNG10 > This is simply a tenfold iteration of the 2 minute stress-ng test case shown > above. Whilst this is a good driver of the system to enter a heavy paging > state, it doesn't really represent anything that might normally be run on a > machine. > > 2) TCM3 > This is a test case that might be more likely to occur normally. I found some > C++ that uses heavily templated code which results in large compilation > process sizes. Specifically, I used the code MatrixSine.cpp which I found as > example code within the libeigen package. Running 3 concurrent compilations > results in swap usage of around 500M on my 4GB test machines: > > # /usr/bin/x86_64-gnu-g++-14 -I/usr/include/eigen3 -g -O2 -o matrix_sine_1 > MatrixSine.cpp & > # /usr/bin/x86_64-gnu-g++-14 -I/usr/include/eigen3 -g -O2 -o matrix_sine_2 > MatrixSine.cpp & > # /usr/bin/x86_64-gnu-g++-14 -I/usr/include/eigen3 -g -O2 -o matrix_sine_3 > MatrixSine.cpp & > > These are the various machine configurations used with each test case: > > 1) VMHURD_REL: VM using Hurd (GNU-Mach 1.8+git20250731-8 amd64) > 4096M RAM (3610M post boot), 2.8G swap > > 2) VMHURD_PAT: VM using Hurd (GNU-Mach 1.8+git20250731-8 amd64 + 'My_patch') > 4096M RAM (3610M post boot), 2.8G swap > > 3) VMLINX: VM using Debian (6.12.48+deb13-amd64) > 3920M RAM (run with maxcpus=1 and has 3610M post boot), 4G swap > > 4) HWHURD_REL: Advent hardware using Hurd (GNU-Mach 1.8+git20250731-8 amd64) > 4096M (Available 3374M after boot), 10G swap > > 5) HWHURD_PAT: Advent hardware using Hurd (GNU-Mach 1.8+git20250731-8 amd64 + > 'My_patch') > 4096M (Available 3374M after boot), 10G swap > > 6) HWLINX: Advent hardware using Debian (6.12.48+deb13-amd64) > 4096M (run with maxcpus=1 and has 3325M available after boot) > > I have a number of my own local glibc, gnumach and hurd patches that fix > various bugs exposed by the stress tests but which I have not yet submitted > for merging. These do not affect swap performance. > > These figures show averages for a number of runs of TCM3: > > VMHURD_REL/TCM3: 11m12s (pagein=2225294,pageout=1972821) > VMHURD_PAT/TCM3: 3m07s (pagein= 179883,pageout= 281279) > > HWHURD_REL/TCM3: Unable to complete any test case > HWHURD_PAT/TCM3: 8m59s (pagein=256466,pageout=373059) > > HWLINX/TCM3: 2m12s (pagein=66796,pageout=236674) > > The VMLINX times are significantly shorter than VMHURD_PAT but due to > differences in virtual machine optimisations it doesn't seem meaningful to > report those. The above however shows that on hardware and even with my > patched kernel that Linux is around 4 times faster than GNU/Hurd in this test > case. > > The stress-ng test case metrics give an indication of the number of mmap and > vm operations completed. Here are the averaged totals for a number of test > cases: > > VMHURD_REL/SNG10: (mmap) 50.1, (vm) 206217 > VMHURD_PAT/SNG10: (mmap) 327.1, (vm) 840666 > HWHURD_REL/SNG10: Unable to complete any test run > HWHURD_PAT/SNG10: (mmap) 183, (vm) 560018 > > TCM3 completes over 3 times faster with 'My_patch' and there are > approximately 4 times as many stress-ng operations completed per iteration. > > All 'My_patch' actually does is to remove the restriction in always > prioritising external pages for page eviction. There are quite a few lines of > code changed but almost all of them are trivial, really. The changes result > in 2 main behavioural differences: > > 1) The vm_page code currently always attempts to find an external page before > looking for internal ones. I have changed a number of functions to be told > explicitly to choose either external or internal. > > 2) The vm_page code currently counts the number of active and inactive pages. > The counts include all external and internal pages. I have changed the code > to maintain separate counts for active_internal, active_external, > inactive_internal and inactive_external pages. > > The final patch in the series uses an extremely unsophisticated algorithm to > determine what type of page to choose for evicting next. It still chooses > external pages until they represent less than 1 in 25 of all (active or > inactive) pages and at that point chooses internal. I do not propose this as > a long term strategy but simply as a starting point for a more meaningful > eviction policy. It is quite frankly ludicrously simplistic but nevertheless > seemingly effective for these test cases at least. > > There are many parts of the current implementation that are negatively > affecting performance. I have some speculative implementations that reduce > the HWHURD_PAT/TCM3 combination from the latest 9m to around 5m but they can > be discussed later if appropriate. > > I'd welcome feedback on whether 'My_patch' should be submitted for > consideration. > > Regards, > > Mike. Thanks for stress testing the Hurd Mike! It's always fun seeing you tackle what appears to me to be some the Hurd's stability issues. Joshua
