Woohoo! thanks Uwe; exciting you were able to get 2x the lifespan of the drives. Let's go for 4x this time!
On Tue, Mar 18, 2025 at 12:53 PM Uwe Schindler <u...@thetaphi.de> wrote: > > Moin moin, > > Policeman Jenkins got new hardware yesterday - no functional changes. > > Background: The old server had some strange problems with the networking > adaptor (Intel's "igb" kernel driver) about "Detected Tx Unit Hang". This > caused some short downtimes and the monitoring complained all the time about > lost pings which drove me crazy at weekend. It worked better after a restart > and also with downgrade of kernel, but as I was about to replace the machine > by a newer one, I ordered a replacement to new Hardware version (previously > it was Hetzner AX51-NVME; now it is: Hetzner AX52). > > The migration was done starting yesterday lunch time europe (12:00 CET) in > the by booting the new server in the datacenter's recovery environment booted > from network on both servers with a temporary IP and then mounting both root > disks and doing a large rsync (with checksums, external attributes, numeric > uid/gid and delete option). Luckily this worked with the old server (the > Intel Adapter did not break). The whole downtime should have taken only 1 to > 1.5 hours (the time copy with 1 GBits and reconfig needs), but unfortunately > the PCIexpress on the new server complained about (recoverable) errors on the > NVME communications. After some support roundtrips (they first replaced only > the failing NVME controller which did not help), the replaced the whole > server. > > At 18:30 CET, I started copy to new server again and all went well, dmesg > showed no PCI express checksum errors. Finally, after fixing boot (the old > server used MBR the new one EFI), the server was mounted at the original > location by the team and all IPv4 adresses and IPv6 network were available. > Since then (approx 20:30 CET), Policeman Jenkins is back and running. > > The TODOs for the future: > > Replace the MacOS VM and update it to a new version (it's complicated, as it > is a "Hackintosh", so it shouldn't be there according to Apple) > Possibly migrate away from VirtualBOX to KVM, but it's unclear if Hackintoshs > work there. > > Have fun with the new hardware, the builds on Lucene main branch are now 1.5 > times faster (10 instead of 15 minutes). > > The new hardware is described here: > https://www.hetzner.com/dedicated-rootserver/ax52/; it has AVX 512.... let's > see what comes out. No test failures yet. > > vendor_id : AuthenticAMD > cpu family : 25 > model : 97 > model name : AMD Ryzen 7 7700 8-Core Processor > stepping : 2 > microcode : 0xa601209 > cpu MHz : 5114.082 > cache size : 1024 KB > physical id : 0 > siblings : 16 > core id : 7 > cpu cores : 8 > apicid : 15 > initial apicid : 15 > fpu : yes > fpu_exception : yes > cpuid level : 16 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc > cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 > sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic > cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce > topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 > hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase > bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap > avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec > xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk > avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv > svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter > pfthreshold avic vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke > avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq > rdpid overflow_recov succor smca fsrm flush_l1d amd_lbr_pmc_freeze > bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso > bogomips : 7585.28 > TLB size : 3584 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14] > > # lspci | fgrep -i volati > 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD > Controller PM9A1/PM9A3/980PRO > 02:00.0 Non-Volatile memory controller: Micron Technology Inc 3400 NVMe SSD > [Hendrix] > > I have no idea why the replacement server has two different NVME SSDs, but > you never know before what you get! From smart info I know that both SSDs > were fresh (6 hours total uptime only). > > Uwe > > -- > > Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: > u...@thetaphi.de --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org