On Thu, Oct 4, 2012 at 4:47 PM, David Speake <[email protected]>wrote:

> One of my application programmers asked for information on design decision.
>
> Given two files that are not in order by the match key.
> Assuming the smaller file has at most one record per possible key value.
> Larger file may have multiple records for some keys and may also have
> values
> not found in the smaller file.
> Assume optimal blocking, buffering, both files, both solutions.
> Which is more efficient?
>
> Sort the smaller by that key, load to VSAM, then pass the unordered larger
> file and
> do random retrievals from the VSAM file.
>
> Sort both PS files by key in question and pass sorted files for matching
> by key.
>
> We both (sorta) think the answer is ... "It depends" but on what criteria?
>
> Number of records in each file? Which is more important? And by how much?
> Record lengths? Same as above?
> Length of key field in question? (Within SORT and VSAM length restrictions
> of course).
> Key bias in larger file?
> Ratio of hits/non hits?
>

If the small file is of a reasonable size, load it to memory, sort it in
memory and then do a binary search on the in memory table for each record
in the large file.

The in memory sort can be simplified by by calling sort programatically.


> Anyone have a nice formula?  :-)
>
> David Speake
>
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: INFO IBM-MAIN
>

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to