On Aug 5, 2021, at 09:28, Nathan Dauchy - NOAA Affiliate via lustre-discuss 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>> wrote:

Greetings ext4 and flash storage experts!

Motivation:  We have ldiskfs OSTs that are primarily HDDs and use a Flash 
device for an external journal device.  Recent IOR benchmarks showed that write 
performance dropped (suddenly?) to about 25% of the original baseline, yet read 
performance remains fine.  We saw similar characteristics at one point in the 
past when the OSTs were mounted without the external flash journal enabled.  I 
verified that the journals are currently enabled, but the past experience still 
led me to question whether the journals were performing well.

Question:  Is it possible that a flash journal device on an ext4 filesystem can 
reach a point where there are not enough clean blocks to write to, and they can 
suffer from very degraded write performance?

For the external journal device, this _shouldn't_ happen, in the sense that the 
writes to this device are pretty much always sequential (except updates to the 
journal superblock), so as long as there is an erase block that can be cleaned 
in advance of the next overwrite, it should be essentially "ideal" usage for 
flash.

I know that "fstrim" can be run for mounted ldiskfs file systems, but when I 
try that it doesn't see the OSTs as using flash, because they are primarily 
HDD-based.  Is there some other way to tell the system which blocks can be 
discarded on the journal flash device?  (I found "blkdiscard" but that seems 
heavyweight and dangerous.)

I don't _think_ you can run fstrim against the journal device directly while it 
is mounted.  However, you could unmount the filesystem cleanly (which flushes 
everything from the journal, check no "needs_recovery" feature is set), remove 
the journal from the filesystem, trim/discard the journal block device, then 
reformat it as an external journal device again and add it back to the 
filesystem.

Another related question would be how to benchmark the journal device on it's 
own, particularly write performance, without losing data on an existing file 
system; similar to the very useful obdfilter-survey tool, but at a lower level. 
 But I am primarily looking to understand the nuances of flash devices and 
ldiskfs external journals a bit better.

While the external journal device has an ext4 superblock header for 
identification (UUID/label), and a feature flag that prevents it from being 
mounted/used directly, it is not really an ext4 filesystem, just a flat "file". 
 You'd need to remove it from the main ext4/ldiskfs filesystem, reformat it as 
ext4 and mount locally, and then run benchmarks (e.g. "dd" would best match the 
JBD2 workload, or fio if you want random IOPS) against it.  You could do this 
before/after trim (could use fstrim at this point) to see if it affects the 
performance or not.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to