We have found that file system utilization, thus compression on virtual disk, is a somewhat nebulous topic in S390 Linux.
Generally we have found S390 Linux 'system' data to hover between 2:1 and 3:1 compression ratio. This is about the same as the compression we see on other ASCII based open systems. User and database data sees better compression. Here are some of the common issues we have seen: 1) Most DASD and memory vendors, including IBM and STK, use 1 million bytes as the definition of a megabyte, and one billion bytes as the definition of a gigabyte. Linux tends to use the 'engineering' definition, which is 1024*1024 for a megabyte, and 1024*1024*1024 for a gigabyte, and so on. If you are doing space calculations, you must find the least common denominator, the byte. If an IXFP/SVAA report shows 1000MB of backend space being used, that is 1,000,000,000 bytes. If the Linux df command shows 1,000,000 1K blocks being used, that is 1,024,000,000 bytes. Use the proper numbers if you are calculating compression. 2) There is an opportunity for residual data to mess up your calculations. For example, if you install a Linux distribution on a volume, and then do a mke2fs on that partition and reinstall over the top of the old data without reformatting the volume, the old track images are still out there taking space on the backend. mke2fs does not reformat the filesystem. The old track images are still there. For you MVS-OS/390-z/OS fans out there, it is like running ICKDSF minimal init on a volume and using it over. The metadata is rewritten, but the tracks are not reformatted. With the newer generations of virtual architecture (StorageTek SVA), you can use Instant Format to format the volume and eliminate the residual data. This literally takes only a second. With RVA you can kind of simulate the same thing by using SnapShot to Snap an empty formatted volume over the volume you are about to use, or setup the 'older' version of Instant Format which will do the Snap automatically. 3) We cannot reconcile the numbers returned by the Linux df command and what is in /proc/dasd/devices. For example for a 2.4.7 kernel ext2 filesystem on a 3390-9 (/dev/dasdb1): tmp05lnx:~ # df Filesystem 1k-blocks Used Available Use% Mounted on /dev/dasdb1 6983168 846212 5776348 13% / shmfs 843972 0 843972 0% /dev/shm tmp05lnx:~ # cat /proc/dasd/devices c723(ECKD) at ( 94: 0) is dasda:active at blocksize: 4096, 200340 blocks, 782 MB c711(ECKD) at ( 94: 4) is dasdb:active at blocksize: 4096, 1803060 blocks, 7043 MB The df command shows /dev/dasdb1 as having 6983168 1K blocks. But it also shows 846212 used, and 5776348 available. 846212+5776348=6622560. Where did 6983168-6622560=360608 blocks go to? I assume it is filesystem overhead, but 360MB of filesystem overhead seems a little high??? Then /proc/dasd/devices shows the partition as having 1803060 4K blocks. This should be 7212240 1K blocks. Another 229072 1K blocks worth of overhead? Note that 1803060*4096 does match the 7043MB number. The 6983168 blocks number from the df works out to only 6819.5 MB, and the Used+Available number of 6622560 only nets out to 6467.3 MB. So the question becomes, which number do you use to calculate utilization and compression? It appears that that 3390-9 only nets 6467MB of actual useable space due to partition and filesystem overhead. Almost 600mb of filesystem and partition overhead! Perhaps JFS, ext3 or others are more efficient, I haven't measured. Someday when I have time I'm going to crack open the source and try to figure this out. Any pointers to good filesystem tutorials will be gratefully accepted. The bottom line is that we see 2:1-3:1 compression overall for the Linux 'system' stuff. This is a relatively low number, but much of what is on the root volume are binaries. User data such as databases, Apache stuff, email, ect. compresses much better. Scott Ledbetter StorageTek -----Original Message----- From: Tony Pearson [mailto:[EMAIL PROTECTED]] Sent: July 09, 2002 3:12 PM To: [EMAIL PROTECTED] Subject: Linux data on IBM RVA List-390 readers: I have a customer writing their Linux data on an IBM Ramac Virtual Array (RVA) which has its own outboard compression. They are finding that the Linux data is not compressing at all. Has anyone else experienced this? Any ideas? Thanks Tony Pearson IBM Storage Systems - Software Architecture and Planning Storage Software for Linux on zSeries [EMAIL PROTECTED] (520) 799-4309 / tieline 321-4309
