Re: Schrödinger's hash

David Christensen Fri, 22 May 2026 16:26:17 -0700

On 5/22/26 12:43, nwe wrote:

On 5/22/26 1:32 PM, Van Snyder wrote:
I remarked to a local computer repair shop that I have a four TBbackup drive. He said "replace it. Four TB isn't ready yet."
How so? I though 4TB is showing its age...



+1  I am also curious why 4 TB HDD's are not "ready yet".

I'm running 12x 4TB drives. Used SAS drives. Accumulated power on timeranges from 40,166 to 73,439 hours.
Smartctl informs me device /dev/sdf is worsening with increased readerrors over time. That one shows 73408 hours powered up, 72360.67 GBread, 119545.193 GB written, 195 power cycles (13 since July 13 2024).Defect list increased from 3 to 6872.
I see two other drives have defect lists of 23 and 14, respectively. Allothers are at 0. Considering that, I should probably prioritizereplacing at least sdf soon to avoid losing redundancy during resilver,considering the age of the pool.

I am still trying to understand the smartctl(8) "SMART Attributes DataStructure". The RAW_VALUE seems to be a binary bit field (?) forseveral attributes and is useless without manufacturer engineering data.The VALUE column is supposed to be a percentage that starts at 100%and goes down to 0% as the disk wears out:

* Raw_Read_Error_Rate, Seek_Error_Rate, and Hardware_ECC_Recovered canhave low VALUE numbers, but the disk seems to keep working.

* Low VALUE numbers for Reallocated_Sector_Ct, Current_Pending_Sector,and/or Offline_Uncorrectable seem to be reliable indicators of a failingdisk.


* I have not seen a VALUE number other than 100% for End-to-End_Error.

nwe@srv01:~$ zpool status -c vendor,model,size
pool: POOL1
state: ONLINE

scan: scrub repaired 0B in 04:10:05 with 0 errors on Sun May 10 04:34:062026

config:

NAME        STATE     READ WRITE CKSUM   vendor         model  size
POOL1       ONLINE       0     0     0
raidz3-0  ONLINE       0     0     0
sdb     ONLINE       0     0     0  SEAGATE  ST4000NM0023  3.6T
sdc     ONLINE       0     0     0  TOSHIBA   MG04SCA40EN  3.6T
sdd     ONLINE       0     0     0  TOSHIBA   MG04SCA40EN  3.6T
sde     ONLINE       0     0     0  SEAGATE  ST4000NM0023  3.6T
sdf     ONLINE       0     0     0  SEAGATE  ST4000NM0023  3.6T
sdh     ONLINE       0     0     0       HP   MB4000FCWDK  3.6T
sdg     ONLINE       0     0     0       HP   MB4000FCWDK  3.6T
sdi     ONLINE       0     0     0  SEAGATE  ST4000NM0023  3.6T
sdj     ONLINE       0     0     0  SEAGATE  ST4000NM0023  3.6T
sdk     ONLINE       0     0     0  SEAGATE  ST4000NM0023  3.6T
sdl     ONLINE       0     0     0       HP   MB4000FCWDK  3.6T
sdm     ONLINE       0     0     0       HP   MB4000FCWDK  3.6T

errors: No known data errors

Twelve disks gives you many choices for how to layout the pool andtrade-off redundancy vs. capacity vs. performance. Is the data balancedacross disks? Does the machine have enough memory? Is the ARC workingwell?

On two of my earlier pools, I added a 60 GB SSD as a cache vdev afterthe pools had data. I did not notice any improvement.

On one of my earlier pools of one mirror of two 3 TB HDD's that wasnearly full, I added another mirror of two 3 TB HDD's. I did not noticeany improvement.

I rebuilt the storage pool with two mirrors of two 3 TB HDD's each and aspecial vdev mirror of two 180 GB SSD's. I also setspecial_small_blocks=16K. I then restored the data via replication.The data is now balanced across disks, latency has dropped, throughputhas increased, and overall performance is noticeably better:


2026-05-22 15:12:45 toor@f5 ~
# freebsd-version
13.5-RELEASE-p12

2026-05-22 15:19:47 toor@f5 ~
# zpool iostat -v p5
                    capacity     operations     bandwidth
pool              alloc   free   read  write   read  write
----------------  -----  -----  -----  -----  -----  -----
p5                3.76T  1.82T      6      1  3.68M  32.2K
  mirror-0        1.87T   871G      2      0  1.82M  4.48K
    gpt/hdd0.eli      -      -      1      0   931K  2.24K
    gpt/hdd1.eli      -      -      1      0   931K  2.24K
  mirror-1        1.86T   876G      2      0  1.81M  4.35K
    gpt/hdd2.eli      -      -      1      0   928K  2.18K
    gpt/hdd3.eli      -      -      1      0   928K  2.18K
special               -      -      -      -      -      -
  mirror-2        31.1G   118G      1      1  51.2K  23.3K
    gpt/ssd0.eli      -      -      0      0  25.6K  11.7K
    gpt/ssd1.eli      -      -      0      0  25.6K  11.7K
----------------  -----  -----  -----  -----  -----  -----  -----

2026-05-22 15:32:42 toor@f5 ~
# top -d 1 | head -n 7

last pid: 57622; load averages: 0.24, 0.21, 0.17 up 24+22:47:0515:32:45

27 processes:  1 running, 26 sleeping
CPU:  0.0% user,  0.0% nice,  0.6% system,  0.0% interrupt, 99.4% idle
Mem: 4848K Active, 330M Inact, 856K Laundry, 14G Wired, 920M Buf, 694M Free
ARC: 12G Total, 10G MFU, 485M MRU, 3328K Anon, 200M Header, 899M Other
     9921M Compressed, 33G Uncompressed, 3.36:1 Ratio
Swap: 764M Total, 764M Free

2026-05-22 15:33:12 toor@f5 ~
# arc_summary | grep -A 5 "ARC total accesses"
ARC total accesses (hits + misses):                               512.7M
        Cache hit ratio:                               99.8 %     511.8M
        Cache miss ratio:                               0.2 %     886.5k
        Actual hit ratio (MFU + MRU hits):             99.3 %     509.2M
        Data demand efficiency:                        99.5 %       4.8M
        Data prefetch efficiency:                      19.2 %      96.9k


In hindsight:

1. I gathered file system statistics prior rebuilding the pool andsetting special_small_blocks=16K, but it now appears I could have used alarger value.

2. If I get worried about HDD's failing, I can add disks to the pool asspares and/or add disks to the data mirrors. The latter should increaseread performance even more.

3. My ~10 year old HDD's can already saturate Gigabit with sequentialI/O. RAID 10 with SSD acceleration is even more overkill. I want 10 GbE.



David

Re: Schrödinger's hash

Reply via email to