On Wed, Oct 5, 2011 at 3:35 PM, DUGALEIX Michaël <[email protected]> wrote: > Hello, > > Is there an idiom to enrich a file with another file ... > 1. ... if the reference file is not too large ? > 2. ... if the reference file is very large ? > > - For a "not too large" reference, I had done it with a "LOOKUP ... DETAIL > ALLMASTER PAIRWISE" + "JOIN" + "SPEC", but is there a better way ? > - For a "very large reference", I'll try and use a "COLLATE", but it has no > "PAIRWISE" option, so I'd try and prefix each file with a different > character before collating, and work the result. But that seems complex. Any > simple idea ?
With ALLMASTER you basically 'multiply' the tables. Your example does not show you need that. If you do, the collate can't do the trick because it steps through both detail and master in a single pass. So collate can't produce multiple masters for a detail record. However, you could pre-process the master with a 'join keylength' and unravel that later. This leaves you with a single master record for each key and collate could do the trick if both are sorted. In general, when the master is big and the number of detail records is small, it may help to reverse the streams (only if keys are unique in both). Long ago, I did write a stage that does a binary search in a disk file. I think I even combined that with my lookup-based cache to reduce the number of disk reads. I can dig between the copper pipes in the shed, if that helps ;-) Rob
