The 10x write factor is probably logs. Solr writes a lot of logs. Most tests need very little data to write & read.
On Wed, Mar 19, 2025 at 8:16 AM Michael Sokolov <msoko...@gmail.com> wrote: > Woohoo! thanks Uwe; exciting you were able to get 2x the lifespan of > the drives. Let's go for 4x this time! > > On Tue, Mar 18, 2025 at 12:53 PM Uwe Schindler <u...@thetaphi.de> wrote: > > > > Moin moin, > > > > Policeman Jenkins got new hardware yesterday - no functional changes. > > > > Background: The old server had some strange problems with the networking > adaptor (Intel's "igb" kernel driver) about "Detected Tx Unit Hang". This > caused some short downtimes and the monitoring complained all the time > about lost pings which drove me crazy at weekend. It worked better after a > restart and also with downgrade of kernel, but as I was about to replace > the machine by a newer one, I ordered a replacement to new Hardware version > (previously it was Hetzner AX51-NVME; now it is: Hetzner AX52). > > > > The migration was done starting yesterday lunch time europe (12:00 CET) > in the by booting the new server in the datacenter's recovery environment > booted from network on both servers with a temporary IP and then mounting > both root disks and doing a large rsync (with checksums, external > attributes, numeric uid/gid and delete option). Luckily this worked with > the old server (the Intel Adapter did not break). The whole downtime should > have taken only 1 to 1.5 hours (the time copy with 1 GBits and reconfig > needs), but unfortunately the PCIexpress on the new server complained about > (recoverable) errors on the NVME communications. After some support > roundtrips (they first replaced only the failing NVME controller which did > not help), the replaced the whole server. > > > > At 18:30 CET, I started copy to new server again and all went well, > dmesg showed no PCI express checksum errors. Finally, after fixing boot > (the old server used MBR the new one EFI), the server was mounted at the > original location by the team and all IPv4 adresses and IPv6 network were > available. Since then (approx 20:30 CET), Policeman Jenkins is back and > running. > > > > The TODOs for the future: > > > > Replace the MacOS VM and update it to a new version (it's complicated, > as it is a "Hackintosh", so it shouldn't be there according to Apple) > > Possibly migrate away from VirtualBOX to KVM, but it's unclear if > Hackintoshs work there. > > > > Have fun with the new hardware, the builds on Lucene main branch are now > 1.5 times faster (10 instead of 15 minutes). > > > > The new hardware is described here: > https://www.hetzner.com/dedicated-rootserver/ax52/; it has AVX 512.... > let's see what comes out. No test failures yet. > > > > vendor_id : AuthenticAMD > > cpu family : 25 > > model : 97 > > model name : AMD Ryzen 7 7700 8-Core Processor > > stepping : 2 > > microcode : 0xa601209 > > cpu MHz : 5114.082 > > cache size : 1024 KB > > physical id : 0 > > siblings : 16 > > core id : 7 > > cpu cores : 8 > > apicid : 15 > > initial apicid : 15 > > fpu : yes > > fpu_exception : yes > > cpuid level : 16 > > wp : yes > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology > nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 > fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm > cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw > ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx > cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp > ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a > avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni > avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc > cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr > rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean > flushbyasid decodeassists pausefilter pfthreshold avic vgif x2avic > v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes > vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov > succor smca fsrm flush_l1d amd_lbr_pmc_freeze > > bugs : sysret_ss_attrs spectre_v1 spectre_v2 > spec_store_bypass srso > > bogomips : 7585.28 > > TLB size : 3584 4K pages > > clflush size : 64 > > cache_alignment : 64 > > address sizes : 48 bits physical, 48 bits virtual > > power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14] > > > > # lspci | fgrep -i volati > > 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe > SSD Controller PM9A1/PM9A3/980PRO > > 02:00.0 Non-Volatile memory controller: Micron Technology Inc 3400 NVMe > SSD [Hendrix] > > > > I have no idea why the replacement server has two different NVME SSDs, > but you never know before what you get! From smart info I know that both > SSDs were fresh (6 hours total uptime only). > > > > Uwe > > > > -- > > > > Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de > eMail: u...@thetaphi.de > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >