Re: Dataset Contention

Ron and Jenny Hawkins Tue, 02 May 2006 11:12:12 -0700

Actually, giving the question some serious consideration, at a reasonably
good cache hit ratio the impact of an infinite number of jobs randomly
reading the same dataset will not be significantly different to an infinite
number of jobs accessing many datasets and volumes on the same shared
channels.


At high cache hit rates elongation usually comes about due to path busy,
whether that be Pend for channel busy on ESCON, frame interleaving on FICON,
or the effects of microprocessor utilisation at either end of the channel.

For random access with poor cache hit ratios, or sequential read, the RAID
scheme will have a much greater bearing on throughput, as will the scheme
employed for pre-fetch in the case of sequential read. 

For Sequential read, good old RAID-1 is probably at the bottom of the pile
in this case as it only employs two spindles, and the pre-fetch scheme used
by the only MF RAID-1 vendor simply flip-flops between the spindles instead
of pre-fetching from both concurrently. RAID-10 schemes that can pre-fetch
from all the disks in parallel would be at the top, with RAID-5/6 not far
behind and standard RAID-10 a bit behind RAID-5/6.

For random it would pretty much depend on how many spindles can be used
concurrently as one volume. RAID-10, RAID-5 and RAID-6 using eight drives
would all have the same performance for a single dataset.

> 
> The SWAG ROT is that there is no free lunch, and each additional job
> will degrade performance to some degree. The first few may not be
> measurable. The degradation may follow a classic 'knee curve' (rise
> slowly to a point then abruptly get much worse).
> 

Hiperbatch is a great tool, but last time I looked it did not support EFDS
:( Pre-loading is not the only way to use Hiperbatch, as it has a great
'catch algorithm' that keeps track of the leading and following jobs
allowing to get bursts of speed from the Hiperbatch buffer. The best way I
have found to Hiperbatch without pre loading is to kick off all the jobs
that will be using hiperbatch at the same time and give them the same
service class. You want them to synchronise around the same area of the file
so one job does the reading, and the rest are just behind reading from
Hiperbatch. If some jobs are accessing too many different parts of the file
then Hiperbatch does not work that well.

Finally, if the datasets is 4-5GB and you want to have a ton of jobs read it
as fast as possible, then on HDS storage consider putting the file in
FlashAccess if you have HDS Arrays. This is functionally the same as Solid
State Disk (remember those), and it won't matter what sort of disk and
parity scheme you are using.

Ron

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: Dataset Contention

Reply via email to