On Thu, Oct 4, 2012 at 4:47 PM, David Speake <[email protected]>wrote:
> One of my application programmers asked for information on design decision. > > Given two files that are not in order by the match key. > Assuming the smaller file has at most one record per possible key value. > Larger file may have multiple records for some keys and may also have > values > not found in the smaller file. > Assume optimal blocking, buffering, both files, both solutions. > Which is more efficient? > > Sort the smaller by that key, load to VSAM, then pass the unordered larger > file and > do random retrievals from the VSAM file. > > Sort both PS files by key in question and pass sorted files for matching > by key. > > We both (sorta) think the answer is ... "It depends" but on what criteria? > > Number of records in each file? Which is more important? And by how much? > Record lengths? Same as above? > Length of key field in question? (Within SORT and VSAM length restrictions > of course). > Key bias in larger file? > Ratio of hits/non hits? > If the small file is of a reasonable size, load it to memory, sort it in memory and then do a binary search on the in memory table for each record in the large file. The in memory sort can be simplified by by calling sort programatically. > Anyone have a nice formula? :-) > > David Speake > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to [email protected] with the message: INFO IBM-MAIN > ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
