Melinda asked me to pass this along

The end of that discussion in July of 1999 was this REXX stage by Michael
Faulhaber:

/* CSVCLN REXX: brush-up a CSV file                                  */
/*       (CSV = character separated values, a table transfer format) */
/* Operation: change separating comma to TAB, single doubled quotes  */
/*            and strip quotes from quoted values                    */
/*                                                                   */
/*   +----------------------------------------------------------+    */
/*   |                                                          |    */
/*   | >>--CSVCLN--+------------------------------------+---->< |    */
/*   |             |--oldsep---+-----------------------+|       |    */
/*   |                         |--newsep---+----------+|        |    */
/*   |                                     |--nixbix--|         |    */
/*   |                                                          |    */
/*   +----------------------------------------------------------+    */
/*                                                                   */
/* OLDSEP  specify the separation character used in input CSV,       */
/*         default is comma (,)                                      */
/* NEWSEP  define the separation character in output,                */
/*         default is TAB (hex 05)                                   */
/*         (OLDSEP and NEWSEP may be defined in hex or as a single   */
/*          character, but % or \ will fail)                         */
/* NIXBIX  is used as a place-holder for treatment of double quotes  */
/*         its occurrence in input is not tested                     */
/* Note: if you specify a wrong OLDSEP, output is choped at quotes   */
/*                                                 .....Mike  190299 */
/* ----------------------------------------------------------------- */
/* Change 050799: Now doubled speed by Melinda's suggestion.         */
/* Note 1: Now a wrong OLDSEP will chop at leading quotes.           */
/* Note 2: Now two new "nixbix" are hardcoded: x00 and x01 should    */
/*          not occur in input.                                      */
/* ----------------------------------------------------------------- */
trace o                               /* in case of stall            */
signal on novalue                     /* No uninitialised variables  */
signal on failure                     /* Allow RC > 0 for a moment   */
'STAGENUM'                            /* Where are we?               */
first? = RC = 1                       /* first or not?               */
'MAXSTREAM IN'                        /* Check only one stream       */
signal on error                       /* now stop for any error      */
if RC ^= 0 then 'ISSUEMSG 264 PIPCSV' /* too many streams: crash     */
if first? then 'ISSUEMSG 127 PIPSCV'  /* if first: stop and say why  */
parse arg oc nc ph z                  /* args in mixed case          */
if z ^='' then 'ISSUEMSG 111 PIPSCV "' z '"'        /* too many args */
if oc = '' then oc = ','              /* default old sep char        */
else do                               /* user defined sep char       */
 oc = strip(oc)                       /* no blanks arround it        */
 l = length(oc)                       /* how many letters?           */
 select                               /* marginal test               */
  when l = 1 then nop                 /* a single character is ok    */
  when l = 2 then oc = x2c(oc)        /* hex assumed                 */
  otherwise 'ISSUEMSG 50 PIPSCV "' oc '"'  /* invalid argument: stop */
end; end                              /* end Select and Else Do      */
if nc = '' then nc = '05'x            /* default new sep char        */
else do                               /* user defined sep char       */
 nc = strip(nc)                       /* no blanks arround it        */
 l = length(nc)                       /* how many letters?           */
 select                               /* marginal test               */
  when l = 1 then nop                 /* a single character is ok    */
  when l = 2 then nc = x2c(nc)        /* hex assumed                 */
  otherwise 'ISSUEMSG 50 PIPSCV "' nc '"'  /* invalid argument: stop */
end; end                              /* end Select and Else Do      */
if ph = '' then ph = 'NixBix'         /* default place holder for "" */
'CALLPIPE (sep % end \ name CSVCLN.REXX) *:',   /* ------- in ------ */
  '% change /""""/'ph'/',             /* single inch-sign            */
  '% change /'oc'"""/'oc'"'ph'/',     /* leading inch-sign           */
  '% change 1.3 /"""/"'ph'/',         /* same in fst column          */
  '% change /'oc'""/'oc'/',           /* empty cells                 */
  '% change 1.2 /""//',               /* same in fst column          */
  '% change /""/'ph'/',               /* example: "zizu = 8"" etc."  */
  '% strip trailing' oc,          /* no empty cells at end of record */
  '% xlate 1-* 40 00',                /* Mask the blanks.            */
  '% tokenize /"/ x01',               /* Tokenize and delimit.       */
  '%q:outside /"/ /"/',               /* branch quoted values        */
  '% xlate 1-*' oc nc,                /* replace old by new sep-char */
  '%f:faninany',                      /* collect all parts of record */
  '% deblock linend 01 terminate',    /* Re-form original records.   */
  '% change /'ph'/"/',                /* place holder to inch-sign   */
  '% xlate 1-* 00 40',                /* Unmask the blanks.          */
  '%*:',                              /* ----------- out ----------- */
  '\q:',                              /* from OUTSIDE                */
  '% nfind "' ||,                     /* Get rid of the quotes.      */
  '%f:'                               /* to FANINAY                  */
failure:; error: exit (RC * (RC ^= 12 & RC ^= 8))   /* RC = 0 if EOF */

On 22 August 2015 at 11:28, Rod Furey <[email protected]> wrote:

> We went over this stuff about 17/18 years ago in Pipeline CForum. What did
> Melinda come up with then and would it be useful in this case?
>
> Rod
> On 21 Aug 2015 22:46, "Rob Van der Heij" <[email protected]> wrote:
>
> > > Thus, the only viable way to process CSV data correctly (i.e.,
> > > compensating for downloading errors) is a new built-in that turns the
> > > field separating commas into something else, specified by the user.
> The
> > > program could verify that the input does not contain this separator
> > > character.
> > >
> > > Of course, doing both is also possible, as long as there are no quoted
> > > CRLFs in inputRange CSV n.
> > >
> > > Preferences, anyone?
> >
> > I don't see how we can do it right when transfer has already interpreted
> > all line breaks to build separate records, including the line breaks that
> > were imbedded in strings. And how do we want to represent the CRLF as
> part
> > of the string in the EBCDIC domain? By x25 or x15 or so?  It's a bit
> harder
> > than 'joincont not trailing /"/ x15'  if we want to handle the CRLF
> between
> > two escaped quotes :-)
> >
> > I'm not convinced CSV for scanning fields in the input range would be a
> big
> > plus, considering that you might also want to produce CSV format.
> Wouldn't
> > we rather deblock the stream into one-field-per-record, maybe stripped
> off
> > the quotes, and possibly prefixed with the field number or separated by a
> > null record. The reverse process would make it into CSV format again, and
> > transfer would do record boundaries and embedded x25 as line breaks.
> >
> > Rob
> > ---
> > Rob van der Heij
> > z/VM Development, CMS Pipelines
> >
>

Reply via email to