Re: [CMS-PIPELINES] Idioms to enrich a file with (big or not) rerence file ?

Rob van der Heij Wed, 05 Oct 2011 06:49:37 -0700

On Wed, Oct 5, 2011 at 3:35 PM, DUGALEIX Michaël
<[email protected]> wrote:
> Hello,
>
> Is there an idiom to enrich a file with another file ...
> 1. ... if the reference file is not too large ?
> 2. ... if the reference file is very large ?
>
> - For a "not too large" reference, I had done it with a "LOOKUP ... DETAIL
> ALLMASTER PAIRWISE" + "JOIN" + "SPEC", but is there a better way ?
> - For a "very large reference", I'll try and use a "COLLATE", but it has no
> "PAIRWISE" option, so I'd try and prefix each file with a different
> character before collating, and work the result. But that seems complex. Any
> simple idea ?


With ALLMASTER you basically 'multiply' the tables. Your example does
not show you need that. If you do, the collate can't do the trick
because it steps through both detail and master in a single pass. So
collate can't produce multiple masters for a detail record. However,
you could pre-process the master with a 'join keylength' and unravel
that later. This leaves you with a single master record for each key
and collate could do the trick if both are sorted.

In general, when the master is big and the number of detail records is
small, it may help to reverse the streams (only if keys are unique in
both).

Long ago, I did write a stage that does a binary search in a disk
file. I think I even combined that with my lookup-based cache to
reduce the number of disk reads. I can dig between the copper pipes in
the shed, if that helps ;-)

Rob

Re: [CMS-PIPELINES] Idioms to enrich a file with (big or not) rerence file ?

Reply via email to