Moin moin,
Policeman Jenkins got new hardware yesterday - no functional changes.
Background: The old server had some strange problems with the networking
adaptor (Intel's "igb" kernel driver) about "Detected Tx Unit Hang".
This caused some short downtimes and the monitoring complained all the
time about lost pings which drove me crazy at weekend. It worked better
after a restart and also with downgrade of kernel, but as I was about to
replace the machine by a newer one, I ordered a replacement to new
Hardware version (previously it was Hetzner AX51-NVME; now it is:
Hetzner AX52).
The migration was done starting yesterday lunch time europe (12:00 CET)
in the by booting the new server in the datacenter's recovery
environment booted from network on both servers with a temporary IP and
then mounting both root disks and doing a large rsync (with checksums,
external attributes, numeric uid/gid and delete option). Luckily this
worked with the old server (the Intel Adapter did not break). The whole
downtime should have taken only 1 to 1.5 hours (the time copy with 1
GBits and reconfig needs), but unfortunately the PCIexpress on the new
server complained about (recoverable) errors on the NVME communications.
After some support roundtrips (they first replaced only the failing NVME
controller which did not help), the replaced the whole server.
At 18:30 CET, I started copy to new server again and all went well,
dmesg showed no PCI express checksum errors. Finally, after fixing boot
(the old server used MBR the new one EFI), the server was mounted at the
original location by the team and all IPv4 adresses and IPv6 network
were available. Since then (approx 20:30 CET), Policeman Jenkins is back
and running.
The TODOs for the future:
* Replace the MacOS VM and update it to a new version (it's
complicated, as it is a "Hackintosh", so it shouldn't be there
according to Apple)
* Possibly migrate away from VirtualBOX to KVM, but it's unclear if
Hackintoshs work there.
Have fun with the new hardware, the builds on Lucene main branch are now
1.5 times faster (10 instead of 15 minutes).
The new hardware is described here:
https://www.hetzner.com/dedicated-rootserver/ax52/; it has AVX 512....
let's see what comes out. No test failures yet.
vendor_id : AuthenticAMD
cpu family : 25
model : 97
model name : AMD Ryzen 7 7700 8-Core Processor
stepping : 2
microcode : 0xa601209
cpu MHz : 5114.082
cache size : 1024 KB
physical id : 0
siblings : 16
core id : 7
cpu cores : 8
apicid : 15
initial apicid : 15
fpu : yes
fpu_exception : yes
cpuid level : 16
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl
xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq
monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c
rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse
3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb
bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2
ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms
invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt
clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves
cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16
clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock
nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter
pfthreshold avic vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke
avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg
avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d
amd_lbr_pmc_freeze
bugs : sysret_ss_attrs spectre_v1 spectre_v2
spec_store_bypass srso
bogomips : 7585.28
TLB size : 3584 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
# lspci | fgrep -i volati
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe
SSD Controller PM9A1/PM9A3/980PRO
02:00.0 Non-Volatile memory controller: Micron Technology Inc 3400 NVMe
SSD [Hendrix]
I have no idea why the replacement server has two different NVME SSDs,
but you never know before what you get! From smart info I know that both
SSDs were fresh (6 hours total uptime only).
Uwe
--
Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de
eMail: u...@thetaphi.de