Scott,
Thanks for the insight.
I think our problem is item 2, residual data. The volumes were cleanly
formatted for use with z/VM and Linux, but over time, files are deleted and
new ones are allocated within ext2fs, and we suspect that this file system
is not "zero-ing out" the deleted files. In other words, if we created a
10GB file system, and filled it with files, then deleted all but one file,
the RVA would still think there is 10GB of data, because none of it was
zeroed-out.
On z/OS, the ERASE ON SCRATCH option zeros-out the tracks containing data,
so that hardware compression can take advantage of this.
Do you know if there is an option in ext2fs to have deleted files
zeroed-out, such as for security purposes, which would benefit outboard
compression, or perhaps another Linux filesystem that has this feature?
Thanks
Tony Pearson
Storage Software for Linux on zSeries
IBM Storage Systems Group
----- Forwarded by Tony Pearson/Tucson/IBM on 07/10/2002 01:37 PM -----
"Ledbetter, Scott E"
<[EMAIL PROTECTED] To: [EMAIL PROTECTED]
TORTEK.COM> cc:
Sent by: Linux on 390 Subject: Re: Linux data on IBM RVA
Port
<[EMAIL PROTECTED]
EDU>
07/10/2002 10:43 AM
Please respond to
Linux on 390 Port
We have found that file system utilization, thus compression on virtual
disk, is a somewhat nebulous topic in S390 Linux.
Generally we have found S390 Linux 'system' data to hover between 2:1 and
3:1 compression ratio. This is about the same as the compression we see on
other ASCII based open systems. User and database data sees better
compression. Here are some of the common issues we have seen:
1) Most DASD and memory vendors, including IBM and STK, use 1 million bytes
as the definition of a megabyte, and one billion bytes as the definition of
a gigabyte. Linux tends to use the 'engineering' definition, which is
1024*1024 for a megabyte, and 1024*1024*1024 for a gigabyte, and so on. If
you are doing space calculations, you must find the least common
denominator, the byte. If an IXFP/SVAA report shows 1000MB of backend
space
being used, that is 1,000,000,000 bytes. If the Linux df command shows
1,000,000 1K blocks being used, that is 1,024,000,000 bytes. Use the
proper
numbers if you are calculating compression.
2) There is an opportunity for residual data to mess up your calculations.
For example, if you install a Linux distribution on a volume, and then do a
mke2fs on that partition and reinstall over the top of the old data without
reformatting the volume, the old track images are still out there taking
space on the backend. mke2fs does not reformat the filesystem. The old
track images are still there. For you MVS-OS/390-z/OS fans out there, it
is
like running ICKDSF minimal init on a volume and using it over. The
metadata is rewritten, but the tracks are not reformatted.
With the newer generations of virtual architecture (StorageTek SVA), you
can
use Instant Format to format the volume and eliminate the residual data.
This literally takes only a second. With RVA you can kind of simulate the
same thing by using SnapShot to Snap an empty formatted volume over the
volume you are about to use, or setup the 'older' version of Instant Format
which will do the Snap automatically.
3) We cannot reconcile the numbers returned by the Linux df command and
what
is in /proc/dasd/devices. For example for a 2.4.7 kernel ext2 filesystem on
a 3390-9 (/dev/dasdb1):
tmp05lnx:~ # df
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/dasdb1 6983168 846212 5776348 13% /
shmfs 843972 0 843972 0% /dev/shm
tmp05lnx:~ # cat /proc/dasd/devices
c723(ECKD) at ( 94: 0) is dasda:active at blocksize: 4096, 200340 blocks,
782 MB
c711(ECKD) at ( 94: 4) is dasdb:active at blocksize: 4096, 1803060 blocks,
7043 MB
The df command shows /dev/dasdb1 as having 6983168 1K blocks. But it also
shows 846212 used, and 5776348 available. 846212+5776348=6622560.
Where did 6983168-6622560=360608 blocks go to? I assume it is filesystem
overhead, but 360MB of filesystem overhead seems a little high???
Then /proc/dasd/devices shows the partition as having 1803060 4K blocks.
This should be 7212240 1K blocks. Another 229072 1K blocks worth of
overhead? Note that 1803060*4096 does match the 7043MB number. The
6983168
blocks number from the df works out to only 6819.5 MB, and the
Used+Available number of 6622560 only nets out to 6467.3 MB.
So the question becomes, which number do you use to calculate utilization
and compression? It appears that that 3390-9 only nets 6467MB of actual
useable space due to partition and filesystem overhead. Almost 600mb of
filesystem and partition overhead! Perhaps JFS, ext3 or others are more
efficient, I haven't measured. Someday when I have time I'm going to crack
open the source and try to figure this out. Any pointers to good
filesystem
tutorials will be gratefully accepted.
The bottom line is that we see 2:1-3:1 compression overall for the Linux
'system' stuff. This is a relatively low number, but much of what is on
the
root volume are binaries. User data such as databases, Apache stuff,
email,
ect. compresses much better.
Scott Ledbetter
StorageTek
-----Original Message-----
From: Tony Pearson [mailto:[EMAIL PROTECTED]]
Sent: July 09, 2002 3:12 PM
To: [EMAIL PROTECTED]
Subject: Linux data on IBM RVA
List-390 readers:
I have a customer writing their Linux data on an IBM Ramac Virtual Array
(RVA) which has its own outboard compression. They are finding that the
Linux data is not compressing at all. Has anyone else experienced this?
Any ideas?
Thanks
Tony Pearson
IBM Storage Systems - Software Architecture and Planning
Storage Software for Linux on zSeries
[EMAIL PROTECTED]
(520) 799-4309 / tieline 321-4309
______________________________________________
Tony Pearson
IBM Storage Systems - Software Strategy and Architecture
[EMAIL PROTECTED]