Moin moin,

Policeman Jenkins got new hardware yesterday - no functional changes.

Background: The old server had some strange problems with the networking adaptor (Intel's "igb" kernel driver) about "Detected Tx Unit Hang". This caused some short downtimes and the monitoring complained all the time about lost pings which drove me crazy at weekend. It worked better after a restart and also with downgrade of kernel, but as I was about to replace the machine by a newer one, I ordered a replacement to new Hardware version (previously it was Hetzner AX51-NVME; now it is: Hetzner AX52).

The migration was done starting yesterday lunch time europe (12:00 CET) in the by booting the new server in the datacenter's recovery environment booted from network on both servers with a temporary IP and then mounting both root disks and doing a large rsync (with checksums, external attributes, numeric uid/gid and delete option). Luckily this worked with the old server (the Intel Adapter did not break). The whole downtime should have taken only 1 to 1.5 hours (the time copy with 1 GBits and reconfig needs), but unfortunately the PCIexpress on the new server complained about (recoverable) errors on the NVME communications. After some support roundtrips (they first replaced only the failing NVME controller which did not help), the replaced the whole server.

At 18:30 CET, I started copy to new server again and all went well, dmesg showed no PCI express checksum errors. Finally, after fixing boot (the old server used MBR the new one EFI), the server was mounted at the original location by the team and all IPv4 adresses and IPv6 network were available. Since then (approx 20:30 CET), Policeman Jenkins is back and running.

The TODOs for the future:

 * Replace the MacOS VM and update it to a new version (it's
   complicated, as it is a "Hackintosh", so it shouldn't be there
   according to Apple)
 * Possibly migrate away from VirtualBOX to KVM, but it's unclear if
   Hackintoshs work there.

Have fun with the new hardware, the builds on Lucene main branch are now 1.5 times faster (10 instead of 15 minutes).

The new hardware is described here: https://www.hetzner.com/dedicated-rootserver/ax52/; it has AVX 512.... let's see what comes out. No test failures yet.

vendor_id       : AuthenticAMD
cpu family      : 25
model           : 97
model name      : AMD Ryzen 7 7700 8-Core Processor
stepping        : 2
microcode       : 0xa601209
cpu MHz         : 5114.082
cache size      : 1024 KB
physical id     : 0
siblings        : 16
core id         : 7
cpu cores       : 8
apicid          : 15
initial apicid  : 15
fpu             : yes
fpu_exception   : yes
cpuid level     : 16
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d amd_lbr_pmc_freeze bugs            : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso
bogomips        : 7585.28
TLB size        : 3584 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

# lspci | fgrep -i volati
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO 02:00.0 Non-Volatile memory controller: Micron Technology Inc 3400 NVMe SSD [Hendrix]

I have no idea why the replacement server has two different NVME SSDs, but you never know before what you get! From smart info I know that both SSDs were fresh (6 hours total uptime only).

Uwe

--

Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de

Reply via email to