Re: Policeman Jenkins => new hardware

David Smiley Wed, 19 Mar 2025 06:15:20 -0700

The 10x write factor is probably logs.  Solr writes a lot of logs.  Most
tests need very little data to write & read.


On Wed, Mar 19, 2025 at 8:16 AM Michael Sokolov <[email protected]> wrote:

> Woohoo! thanks Uwe; exciting you were able to get 2x the lifespan of
> the drives.  Let's go for 4x this time!
>
> On Tue, Mar 18, 2025 at 12:53 PM Uwe Schindler <[email protected]> wrote:
> >
> > Moin moin,
> >
> > Policeman Jenkins got new hardware yesterday - no functional changes.
> >
> > Background: The old server had some strange problems with the networking
> adaptor (Intel's "igb" kernel driver) about "Detected Tx Unit Hang". This
> caused some short downtimes and the monitoring complained all the time
> about lost pings which drove me crazy at weekend. It worked better after a
> restart and also with downgrade of kernel, but as I was about to replace
> the machine by a newer one, I ordered a replacement to new Hardware version
> (previously it was Hetzner AX51-NVME; now it is: Hetzner AX52).
> >
> > The migration was done starting yesterday lunch time europe (12:00 CET)
> in the by booting the new server in the datacenter's recovery environment
> booted from network on both servers with a temporary IP and then mounting
> both root disks and doing a large rsync (with checksums, external
> attributes, numeric uid/gid and delete option). Luckily this worked with
> the old server (the Intel Adapter did not break). The whole downtime should
> have taken only 1 to 1.5 hours (the time copy with 1 GBits and reconfig
> needs), but unfortunately the PCIexpress on the new server complained about
> (recoverable) errors on the NVME communications. After some support
> roundtrips (they first replaced only the failing NVME controller which did
> not help), the replaced the whole server.
> >
> > At 18:30 CET, I started copy to new server again and all went well,
> dmesg showed no PCI express checksum errors. Finally, after fixing boot
> (the old server used MBR the new one EFI), the server was mounted at the
> original location by the team and all IPv4 adresses and IPv6 network were
> available. Since then (approx 20:30 CET), Policeman Jenkins is back and
> running.
> >
> > The TODOs for the future:
> >
> > Replace the MacOS VM and update it to a new version (it's complicated,
> as it is a "Hackintosh", so it shouldn't be there according to Apple)
> > Possibly migrate away from VirtualBOX to KVM, but it's unclear if
> Hackintoshs work there.
> >
> > Have fun with the new hardware, the builds on Lucene main branch are now
> 1.5 times faster (10 instead of 15 minutes).
> >
> > The new hardware is described here:
> https://www.hetzner.com/dedicated-rootserver/ax52/; it has AVX 512....
> let's see what comes out. No test failures yet.
> >
> > vendor_id       : AuthenticAMD
> > cpu family      : 25
> > model           : 97
> > model name      : AMD Ryzen 7 7700 8-Core Processor
> > stepping        : 2
> > microcode       : 0xa601209
> > cpu MHz         : 5114.082
> > cache size      : 1024 KB
> > physical id     : 0
> > siblings        : 16
> > core id         : 7
> > cpu cores       : 8
> > apicid          : 15
> > initial apicid  : 15
> > fpu             : yes
> > fpu_exception   : yes
> > cpuid level     : 16
> > wp              : yes
> > flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
> pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology
> nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3
> fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm
> cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw
> ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx
> cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp
> ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a
> avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni
> avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc
> cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr
> rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean
> flushbyasid decodeassists pausefilter pfthreshold avic vgif x2avic
> v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes
> vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov
> succor smca fsrm flush_l1d amd_lbr_pmc_freeze
> > bugs            : sysret_ss_attrs spectre_v1 spectre_v2
> spec_store_bypass srso
> > bogomips        : 7585.28
> > TLB size        : 3584 4K pages
> > clflush size    : 64
> > cache_alignment : 64
> > address sizes   : 48 bits physical, 48 bits virtual
> > power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
> >
> > # lspci | fgrep -i volati
> > 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe
> SSD Controller PM9A1/PM9A3/980PRO
> > 02:00.0 Non-Volatile memory controller: Micron Technology Inc 3400 NVMe
> SSD [Hendrix]
> >
> > I have no idea why the replacement server has two different NVME SSDs,
> but you never know before what you get! From smart info I know that both
> SSDs were fresh (6 hours total uptime only).
> >
> > Uwe
> >
> > --
> >
> > Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de
> eMail: [email protected]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Policeman Jenkins => new hardware

Reply via email to