Thank you very much, John and Rob. I will look into the options you
mentioned, John.
The rest of this post is a fine point about issue 3, running out of storage
when a larger number of files is read. Sorry if this entry is too long.
I realized after I posted that the autoadd was accumulating master records,
as you pointed out, Rob. Thanks, I understand that. But because of the
unique nature of this pipe, which I did not explain before, the number of
master records should not grow much more for 11 files than for one file:
the pipe is an accumulator, summing other fields associated with each key
and producing the totals for each key as the final output. The key values
are largely the same in all the input files; reading 11 files finds only
about 20% more unique keys than reading one file. The lookup stage deletes
each detail record that is added and adds a new detail record with the same
key (and the updated sums). I guess the reason that storage grows is either
(1) the deletions are not synchronous, so deleted records accumulate for a
while, although I suspect that is not the answer, or (2) storage used by the
deleted records is not reclaimed before the lookup (or maybe the entire
pipe) completes.
Since I've raised the issue of how the rest of the pipe works, I guess I
should show you the entire pipe (thanks to Mike Harding for developing the
accumulator algorithm):
/* This pipe gets a huge performance gain over other methods (all of */
/* which needed the "var" stage to fetch and set subtotals), but it's */
/* hard to follow. It uses the four inputs to lookup. The trick is */
/* in feeding the master-delete to lookup's fourth input ahead of the */
/* master-add (third). Note that the tertiary output of lookup is */
/* already sorted by key. */
'PIPE (name huge end ?) stem input_files.', /* fn ft (fm|dir) etc. */
'| console', /* show details */
'| getfiles', /* all records from all files */
, /* concatenate in case take_stage is a find or nfind: */
take_stage ||, /* take only a sample in dbg mode */
'| spec w1 1 w2 nw /1/ nw w3-* nw', /* insert day count 1 as w3 */
, /* and reduce no. bytes/record */
'| acc: lookup count autoadd w1', /* don't care about count, but */
, /* forces dump of masters at end */
'| copy', /* free lookup after match */
'| spec t: w2 . d: w3 .', /* detail appearance/day counts */
'set #0:=t', /* detail appearance count */
'set #1:=d', /* detail day count */
'read t: w2 . d: w3 .', /* master appearance/day counts */
'set #0+=t', /* accumulate appearance count */
'set #1+=d', /* accumulate day count */
'w1 1', /* CUSIP */
'print #0 strip nw', /* updated appearance count */
'print #1 strip nw', /* updated day count */
'w4-* nw', /* flags */
'| w1: fanout', /* artifice for label ordering */
'? acc:', /* no secondary input or output */
'? w1:',
'| w2: not fanout', /* feed 2nd (lookup 4th) first */
'| acc:', /* tertiary input, add to master */
'| not chop 10', /* 3rd o/p, gr totals; drop count */
, /* cusip, appct, dayct, <flags> */
out_stages, /* 1 or 2 sets of 1st 3 files */
'? w2:', /* fanout tertiary */
'| acc:', /* 4th input, delete from master */
'? var rexxpgm', /* accum flag1 value totals */
'| f1:' /* to 2nd f1 input in out_stages */
take_stage, out_stages, and rexxpgm should not affect the storage issue, but
I will provide them if anyone wants to see them.
Thanks again, folks.