Re: dfsort

Sri h Kolusu Wed, 24 Apr 2019 08:50:17 -0700

> I am wondering/believing that JOINKEYS could do all of this in one
single swoop..> Now, waiting for Sri ... ;-)


Elardus Engelbrecht.

Like minds think alike. : ) Massimo has already posted a solution using
Joinkeys. However he came up with an idea where the input file is SORTED
twice ( Subtask 1 and subtask2 ). For smaller files this is acceptable, but
if the input file have millions of records then it is advisable NOT to SORT
the file twice.  The idea of matching the file to itself is to use a COPY
operation, so that you don't require any resources.  If do have to SORT the
data, then I would prefer 1 pass of sorting the data like shown below,

Just evaluate description and if it is 'CORPORATIVO' then have a temporary
value of zero at position 81 and it is anything other than the required
string , then have the temp value as 1.

Sort on the Item_nbr and Sum the temporary value at 81.  By doing so, any
item_nbr that has a different value other than 'CORPORATIVO' will have a
total greater than 1 and those are the records that you need to eliminate.

//STEP0100 EXEC PGM=SORT
//SYSOUT   DD SYSOUT=*
//SORTIN   DD *
   4046340¦CORPORATIVO                 UNIQUE - PICK
   4046564¦CORPORATIVO                 HAS ANOTHER DESC - DROP
   4046564¦ESTADO                      HAS ANOTHER DESC - DROP
   4047131¦CORPORATIVO                 UNIQUE - PICK
   4047460¦CORPORATIVO                 UNIQUE - PICK
   4047479¦CORPORATIVO                 UNIQUE - PICK
   4047480¦CORPORATIVO                 HAS ANOTHER DESC - DROP
   4047480¦RON THOMAS                  HAS ANOTHER DESC - DROP
   4047481¦CORPORATIVO                 DUPLICATE DESC   - PICK
   4047481¦CORPORATIVO                 DUPLICATE DESC   - PICK
//SORTOUT  DD SYSOUT=*
//SYSIN    DD *
  INREC OVERLAY=(81:12,20,CHANGE=(3,C'CORPORATIVO',C'000'),
                         NOMATCH=(C'001'))

  SORT FIELDS=(01,10,CH,A)
   SUM FIELDS=(81,3,ZD)

  OUTFIL OMIT=(81,3,ZD,GT,0),
  BUILD=(1,80)
//*


The output from this job is

   4046340¦CORPORATIVO                 UNIQUE - PICK
   4047131¦CORPORATIVO                 UNIQUE - PICK
   4047460¦CORPORATIVO                 UNIQUE - PICK
   4047479¦CORPORATIVO                 UNIQUE - PICK
   4047481¦CORPORATIVO                 DUPLICATE DESC   - PICK

Notice how the Item_Nbr   "4047481"  has 2 records and both of them have
CORPORATIVO, but we write out only 1 record

Since OP has shown the data to be already sorted, we can use JOINKEYS to
drop the records that have a different description other than
'CORPORATIVO'  by matching the file to itself. Subtask1 will include all
the records with  'CORPORATIVO'  and Subtask2 will include records other
than  'CORPORATIVO' .  The options SORTED,NOSEQCK will make Joinkeys use a
COPY operation.


//STEP0100 EXEC PGM=SORT
//SYSOUT   DD SYSOUT=*
//INA      DD *
   4046340¦CORPORATIVO                 UNIQUE - PICK
   4046564¦CORPORATIVO                 HAS ANOTHER DESC - DROP
   4046564¦ESTADO                      HAS ANOTHER DESC - DROP
   4047131¦CORPORATIVO                 UNIQUE - PICK
   4047460¦CORPORATIVO                 UNIQUE - PICK
   4047479¦CORPORATIVO                 UNIQUE - PICK
   4047480¦CORPORATIVO                 HAS ANOTHER DESC - DROP
   4047480¦RON THOMAS                  HAS ANOTHER DESC - DROP
   4047481¦CORPORATIVO                 DUPLICATE DESC   - PICK
   4047481¦CORPORATIVO                 DUPLICATE DESC   - PICK
//INB      DD *
   4046340¦CORPORATIVO                 UNIQUE - PICK
   4046564¦CORPORATIVO                 HAS ANOTHER DESC - DROP
   4046564¦ESTADO                      HAS ANOTHER DESC - DROP
   4047131¦CORPORATIVO                 UNIQUE - PICK
   4047460¦CORPORATIVO                 UNIQUE - PICK
   4047479¦CORPORATIVO                 UNIQUE - PICK
   4047480¦CORPORATIVO                 HAS ANOTHER DESC - DROP
   4047480¦RON THOMAS                  HAS ANOTHER DESC - DROP
   4047481¦CORPORATIVO                 DUPLICATE DESC   - PICK
   4047481¦CORPORATIVO                 DUPLICATE DESC   - PICK
//SORTOUT  DD SYSOUT=*
//SYSIN    DD *
  OPTION COPY
  JOINKEYS F1=INA,FIELDS=(1,10,A),SORTED,NOSEQCK
  JOINKEYS F2=INB,FIELDS=(1,10,A),SORTED,NOSEQCK
  JOIN UNPAIRED,F1,ONLY
//*
//JNF1CNTL DD *
  INCLUDE COND=(12,20,CH,EQ,C'CORPORATIVO')
/*
//JNF2CNTL DD *
  INCLUDE COND=(12,20,CH,NE,C'CORPORATIVO')
/*

The output from this job is

   4046340¦CORPORATIVO                 UNIQUE - PICK
   4047131¦CORPORATIVO                 UNIQUE - PICK
   4047460¦CORPORATIVO                 UNIQUE - PICK
   4047479¦CORPORATIVO                 UNIQUE - PICK
   4047481¦CORPORATIVO                 DUPLICATE DESC   - PICK
   4047481¦CORPORATIVO                 DUPLICATE DESC   - PICK

Notice how we picked the 2 duplicate records for the item_Nbr  "4047481"

If the data is NOT sorted and you still want to use JOINKEYS method then,
then you can optimize the job a bit

On the Subtask2, all you care is just the Item_Nbr, so that you can match
it. So build the Subtask2 with just the Item_Nbr using INREC statement.
This will reduce the resources required to sort ( 80 byte sorting vs 10
byte sorting).  You also do NOT require SUM FIELDS=NONE as the intention is
to DROP the matched records. And since we are using JOIN UNPAIRED,F1,ONLY
the matched records will be dropped even if the Join ended up with a
Cartesian join.

The updated control cards for Joinkeys would be as follows.

//SYSIN    DD *
  OPTION COPY
  JOINKEYS F1=INA,FIELDS=(1,10,A)
  JOINKEYS F2=INB,FIELDS=(1,10,A)
  JOIN UNPAIRED,F1,ONLY
//*
//JNF1CNTL DD *
  INCLUDE COND=(12,20,CH,EQ,C'CORPORATIVO')
/*
//JNF2CNTL DD *
  INCLUDE COND=(12,20,CH,NE,C'CORPORATIVO')
  INREC BUILD=(1,10)
/*


Further if you have any questions please let me know

Thanks,
Kolusu
DFSORT Development
IBM Corporation


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: dfsort

Reply via email to