Melinda asked me to pass this along The end of that discussion in July of 1999 was this REXX stage by Michael Faulhaber:
/* CSVCLN REXX: brush-up a CSV file */ /* (CSV = character separated values, a table transfer format) */ /* Operation: change separating comma to TAB, single doubled quotes */ /* and strip quotes from quoted values */ /* */ /* +----------------------------------------------------------+ */ /* | | */ /* | >>--CSVCLN--+------------------------------------+---->< | */ /* | |--oldsep---+-----------------------+| | */ /* | |--newsep---+----------+| | */ /* | |--nixbix--| | */ /* | | */ /* +----------------------------------------------------------+ */ /* */ /* OLDSEP specify the separation character used in input CSV, */ /* default is comma (,) */ /* NEWSEP define the separation character in output, */ /* default is TAB (hex 05) */ /* (OLDSEP and NEWSEP may be defined in hex or as a single */ /* character, but % or \ will fail) */ /* NIXBIX is used as a place-holder for treatment of double quotes */ /* its occurrence in input is not tested */ /* Note: if you specify a wrong OLDSEP, output is choped at quotes */ /* .....Mike 190299 */ /* ----------------------------------------------------------------- */ /* Change 050799: Now doubled speed by Melinda's suggestion. */ /* Note 1: Now a wrong OLDSEP will chop at leading quotes. */ /* Note 2: Now two new "nixbix" are hardcoded: x00 and x01 should */ /* not occur in input. */ /* ----------------------------------------------------------------- */ trace o /* in case of stall */ signal on novalue /* No uninitialised variables */ signal on failure /* Allow RC > 0 for a moment */ 'STAGENUM' /* Where are we? */ first? = RC = 1 /* first or not? */ 'MAXSTREAM IN' /* Check only one stream */ signal on error /* now stop for any error */ if RC ^= 0 then 'ISSUEMSG 264 PIPCSV' /* too many streams: crash */ if first? then 'ISSUEMSG 127 PIPSCV' /* if first: stop and say why */ parse arg oc nc ph z /* args in mixed case */ if z ^='' then 'ISSUEMSG 111 PIPSCV "' z '"' /* too many args */ if oc = '' then oc = ',' /* default old sep char */ else do /* user defined sep char */ oc = strip(oc) /* no blanks arround it */ l = length(oc) /* how many letters? */ select /* marginal test */ when l = 1 then nop /* a single character is ok */ when l = 2 then oc = x2c(oc) /* hex assumed */ otherwise 'ISSUEMSG 50 PIPSCV "' oc '"' /* invalid argument: stop */ end; end /* end Select and Else Do */ if nc = '' then nc = '05'x /* default new sep char */ else do /* user defined sep char */ nc = strip(nc) /* no blanks arround it */ l = length(nc) /* how many letters? */ select /* marginal test */ when l = 1 then nop /* a single character is ok */ when l = 2 then nc = x2c(nc) /* hex assumed */ otherwise 'ISSUEMSG 50 PIPSCV "' nc '"' /* invalid argument: stop */ end; end /* end Select and Else Do */ if ph = '' then ph = 'NixBix' /* default place holder for "" */ 'CALLPIPE (sep % end \ name CSVCLN.REXX) *:', /* ------- in ------ */ '% change /""""/'ph'/', /* single inch-sign */ '% change /'oc'"""/'oc'"'ph'/', /* leading inch-sign */ '% change 1.3 /"""/"'ph'/', /* same in fst column */ '% change /'oc'""/'oc'/', /* empty cells */ '% change 1.2 /""//', /* same in fst column */ '% change /""/'ph'/', /* example: "zizu = 8"" etc." */ '% strip trailing' oc, /* no empty cells at end of record */ '% xlate 1-* 40 00', /* Mask the blanks. */ '% tokenize /"/ x01', /* Tokenize and delimit. */ '%q:outside /"/ /"/', /* branch quoted values */ '% xlate 1-*' oc nc, /* replace old by new sep-char */ '%f:faninany', /* collect all parts of record */ '% deblock linend 01 terminate', /* Re-form original records. */ '% change /'ph'/"/', /* place holder to inch-sign */ '% xlate 1-* 00 40', /* Unmask the blanks. */ '%*:', /* ----------- out ----------- */ '\q:', /* from OUTSIDE */ '% nfind "' ||, /* Get rid of the quotes. */ '%f:' /* to FANINAY */ failure:; error: exit (RC * (RC ^= 12 & RC ^= 8)) /* RC = 0 if EOF */ On 22 August 2015 at 11:28, Rod Furey <[email protected]> wrote: > We went over this stuff about 17/18 years ago in Pipeline CForum. What did > Melinda come up with then and would it be useful in this case? > > Rod > On 21 Aug 2015 22:46, "Rob Van der Heij" <[email protected]> wrote: > > > > Thus, the only viable way to process CSV data correctly (i.e., > > > compensating for downloading errors) is a new built-in that turns the > > > field separating commas into something else, specified by the user. > The > > > program could verify that the input does not contain this separator > > > character. > > > > > > Of course, doing both is also possible, as long as there are no quoted > > > CRLFs in inputRange CSV n. > > > > > > Preferences, anyone? > > > > I don't see how we can do it right when transfer has already interpreted > > all line breaks to build separate records, including the line breaks that > > were imbedded in strings. And how do we want to represent the CRLF as > part > > of the string in the EBCDIC domain? By x25 or x15 or so? It's a bit > harder > > than 'joincont not trailing /"/ x15' if we want to handle the CRLF > between > > two escaped quotes :-) > > > > I'm not convinced CSV for scanning fields in the input range would be a > big > > plus, considering that you might also want to produce CSV format. > Wouldn't > > we rather deblock the stream into one-field-per-record, maybe stripped > off > > the quotes, and possibly prefixed with the field number or separated by a > > null record. The reverse process would make it into CSV format again, and > > transfer would do record boundaries and embedded x25 as line breaks. > > > > Rob > > --- > > Rob van der Heij > > z/VM Development, CMS Pipelines > > >
