I noticed the SSD lifespan numbers dropping rapidly on some servers -
losing a percent every 1-2 days. The wear leveling count was incrementing
about once an hour.
Figure 125 days for a new SSD to reach 0% lifespan remaining.
vmstat was reporting a consistent solid 2MB/s of writes (1MB/s per SSD -
mdadm mirror).
The write amplification factor is approximately 57x (where the SSD has to
write 57x this amount of data due to the flash sector size).
Stopping fail2ban stopped the writes.
SSD: 256GB, 11% used, fstrim once a day.
RAM: 24GB, 1GB used.
Swap: 4GB, 0% used.
Starting fail2ban... after 5 minutes the solid 2MB/s writes resumed.
Fail2ban is monitoring ssh and apache logs.
fail2ban was using a sqlite DB... 130k in size.
Changing the configuration to :memory: fixed the problem - writes all but
disappeared.
Looking at sqlite...
1) The sqlite DB is causing high disk writes - ~600kB/s to disk with a 130k
DB file.
2) The sqlite DB appears to be writing very small chunks of data resulting
in a high write amplification factor.
3) The sqlite DB appears to be forcing a sync to disk at a high rate.
Rewriting part or all of a 130k file should be cached and result in very
little write load.
Question - does fail2ban have any control over if sqlite forces a sync to
disk, or the rate at which sqlite forces a sync to disk?
I suspect configuring sqlite not to force a sync to disk, or to force the
sync at a slower rate, say once every 5 seconds, would solve the problem.
Meanwhile I need to start replacing SSDs :-(
Nick
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Fail2ban-users mailing list
Fail2ban-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fail2ban-users