Hi Zachary,
When a compressed file is mmapped, each 4K read in your tests causes the accessed part of the file to be decompressed (in the granularity of 10 GPFS blocks). For usual file sizes, the parts being accessed will be decompressed and IOs speed will be normal except for the first 4K IO in each 10-GPFS-block group. For very large files, a large percentage of small random IOs may keep getting amplified to 10-block decompression IO for a long time. This is probably what happened in your mmap application run.
The suggestion is to not compress files until they have become cold (not likely to be accessed any time soon) and avoid compressing very large files that may be accessed through mmap later. The product already has a built-in protection preventing compression of files that are mmapped at compression time. You can add an exclude rule in the compression policy run for files that are identified to have mmap performance issues (in case they get mmapped after being compressed in a periodical policy run).
Leo Luan
From: Zachary Giles <[email protected]>
To: gpfsug main discussion list <[email protected]>
Date: 02/10/2017 01:57 PM
Subject: [gpfsug-discuss] Questions about mmap GPFS and compression
Sent by: [email protected]
To: gpfsug main discussion list <[email protected]>
Date: 02/10/2017 01:57 PM
Subject: [gpfsug-discuss] Questions about mmap GPFS and compression
Sent by: [email protected]
Hello All,
I've been seeing some less than desirable behavior with mmap and
compression in GPFS. Curious if others see similar or have any ideas
if this is accurate..
The guys here want me to open an IBM ticket, but I figured I'd see if
anyone has had this experience before.
We have an internally developed app that runs on our cluster
referencing data sitting in GPFS. It is using mmap to access the files
due to a library we're using that requires it.
If we run the app against some data on GPFS, it performs well..
finishing in a few minutes time -- Great. However, if we compress the
file (in GPFS), the app is still running after 2 days time.
stracing the app shows that is polling on a file descriptor, forever..
as if a data block is still pending.
I know mmap is supported with compression according to the manual
(with some stipulations), and that performance is expected to be much
less since it's more large-block oriented due to decompressed in
groups.. no problem. But it seems like some data should get returned.
I'm surprised to find that a very small amount of data is sitting in
the buffers (mmfsadm dump buffers) in reference to the inodes. The
decompression thread is running continuously, while the app is still
polling for data from memory and sleeping, retrying, sleeping, repeat.
What I believe is happening is that the 4k pages are being pulled out
of large decompression groups from an mmap read request, put in the
buffer, then the compression group data is thrown away since it has
the result it wants, only to need another piece of data that would
have been in that group slightly later, which is recalled, put in the
buffer.. etc. Thus an infinite slowdown. Perhaps also the data is
expiring out of the buffer before the app has a chance to read it. I
can't tell. In any case, the app makes zero progress.
I tried without our app, using fio.. mmap on an uncompressed file with
1 thread 1 iodepth, random read, 4k blocks, yields ~76MB/s (not
impressive). However, on a compressed file it is only 20KB/s max. (
far less impressive ). Reading a file using aio etc is over 3GB/s on a
single thread without even trying.
What do you think?
Anyone see anything like this? Perhaps there are some tunings to waste
a bit more memory on cached blocks rather than make decompression
recycle?
I've searched back the archives a bit. There's a May 2013 thread about
slowness as well. I think we're seeing much much less than that. Our
page pools are of decent size. Its not just slowness, it's as if the
app never gets a block back at all. ( We could handle slowness .. )
Thanks. Open to ideas..
-Zach Giles
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
