Woohoo! thanks Uwe; exciting you were able to get 2x the lifespan of
the drives.  Let's go for 4x this time!

On Tue, Mar 18, 2025 at 12:53 PM Uwe Schindler <u...@thetaphi.de> wrote:
>
> Moin moin,
>
> Policeman Jenkins got new hardware yesterday - no functional changes.
>
> Background: The old server had some strange problems with the networking 
> adaptor (Intel's "igb" kernel driver) about "Detected Tx Unit Hang". This 
> caused some short downtimes and the monitoring complained all the time about 
> lost pings which drove me crazy at weekend. It worked better after a restart 
> and also with downgrade of kernel, but as I was about to replace the machine 
> by a newer one, I ordered a replacement to new Hardware version (previously 
> it was Hetzner AX51-NVME; now it is: Hetzner AX52).
>
> The migration was done starting yesterday lunch time europe (12:00 CET) in 
> the by booting the new server in the datacenter's recovery environment booted 
> from network on both servers with a temporary IP and then mounting both root 
> disks and doing a large rsync (with checksums, external attributes, numeric 
> uid/gid and delete option). Luckily this worked with the old server (the 
> Intel Adapter did not break). The whole downtime should have taken only 1 to 
> 1.5 hours (the time copy with 1 GBits and reconfig needs), but unfortunately 
> the PCIexpress on the new server complained about (recoverable) errors on the 
> NVME communications. After some support roundtrips (they first replaced only 
> the failing NVME controller which did not help), the replaced the whole 
> server.
>
> At 18:30 CET, I started copy to new server again and all went well, dmesg 
> showed no PCI express checksum errors. Finally, after fixing boot (the old 
> server used MBR the new one EFI), the server was mounted at the original 
> location by the team and all IPv4 adresses and IPv6 network were available. 
> Since then (approx 20:30 CET), Policeman Jenkins is back and running.
>
> The TODOs for the future:
>
> Replace the MacOS VM and update it to a new version (it's complicated, as it 
> is a "Hackintosh", so it shouldn't be there according to Apple)
> Possibly migrate away from VirtualBOX to KVM, but it's unclear if Hackintoshs 
> work there.
>
> Have fun with the new hardware, the builds on Lucene main branch are now 1.5 
> times faster (10 instead of 15 minutes).
>
> The new hardware is described here: 
> https://www.hetzner.com/dedicated-rootserver/ax52/; it has AVX 512.... let's 
> see what comes out. No test failures yet.
>
> vendor_id       : AuthenticAMD
> cpu family      : 25
> model           : 97
> model name      : AMD Ryzen 7 7700 8-Core Processor
> stepping        : 2
> microcode       : 0xa601209
> cpu MHz         : 5114.082
> cache size      : 1024 KB
> physical id     : 0
> siblings        : 16
> core id         : 7
> cpu cores       : 8
> apicid          : 15
> initial apicid  : 15
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 16
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
> pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc 
> cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 
> sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic 
> cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce 
> topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 
> hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase 
> bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap 
> avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec 
> xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk 
> avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv 
> svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter 
> pfthreshold avic vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke 
> avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq 
> rdpid overflow_recov succor smca fsrm flush_l1d amd_lbr_pmc_freeze
> bugs            : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso
> bogomips        : 7585.28
> TLB size        : 3584 4K pages
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 48 bits physical, 48 bits virtual
> power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
>
> # lspci | fgrep -i volati
> 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD 
> Controller PM9A1/PM9A3/980PRO
> 02:00.0 Non-Volatile memory controller: Micron Technology Inc 3400 NVMe SSD 
> [Hendrix]
>
> I have no idea why the replacement server has two different NVME SSDs, but 
> you never know before what you get! From smart info I know that both SSDs 
> were fresh (6 hours total uptime only).
>
> Uwe
>
> --
>
> Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: 
> u...@thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Reply via email to