------------------------------<snip>------------------------------
I'm wondering if your primary rule of not compressing a file unless it will exceed its architectural limit may have blocked the opportunity for you to come across cases where compression is not a waste of time.

Synchronous remote copy is one area where compression of datasets created or updated in the batch critical path can buy back the impact of write elongation, and sometimes leave you with improved elapsed times. For example where TC, SRDF or PPRC introduce a 100% write response time impact in a father-to-son update, but the input and output files are compressed to 40% of their original size. That's a net run time improvement of 20% due to IO reduction.

Another example are datasets written or updated once, and then read by many programs. Even a reduction in size of only 40% results in a corresponding reduction in IO activity multiplied by the number of times the file is read. Using LZW compression means that there is only a small increase in CPU Time to decompress the file multiple times, usually around 15-20% of the CPU cost of compressing the file. The more you read the dataset, the more benefit you obtain.

These are two examples where I have used DFSMS compression to make significant improvements to batch run time. The only hiccup was when IBM moved the compression assist instructions to firmware on the G3 CMOS processor (it may have been G4). They were back in the hardware by G6 and all the advantages of asymmetric CPU cost for compress/decompress returned.

Huffman may not be an appropriate compression algorithm for this because the implementations I have experience with (e.g. DFSMShsm, DFSMSdss) have an equal cost for compress and decompress. The symmetrical CPU cost heavily impacts the value of a compressed input file.
------------------------------<unsnip>-------------------------------
In the case of ARCHIVER, I do compression because I have to handle multiple record formats and lengths. Compression was easier than trying to devise a segmentation scheme that was sufficiently flexible.

Never mind the space issues that can arise.

My current private archive contains nearly 15 million logical records of source/CLIST/REXX and fits in about 250 cylinders of VSAM cluster.

Rick

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to