On Sun, 22 Apr 2018 18:29:38 -0400, Hobart Spitz wrote:
>
>*With the *nix/C record and string models, there are these issues:*
>
>   1. Errant/unexpected/unintended pieces of binary data in a text
>   file/string can break something.
>   2. Separate functions/methods/techniques must be used to manipulate text
>   files/strings versus binary files/string. You *must* know what you are
>   dealing with up front, and/or somehow code logic for both. (I'm not sure
>   the latter is possible in the general case.) 
>
Gee.  The program must know the format of the data it's dealing with.
Hardly a surprise.

>   3. Even with *nix/C oriented machine instructions, the need to inspect
>   all characters up to a target point results in performance killing cache
>   flooding.
>   
This flaw is also present in the pervasive RECFM=FB,LRECL=80.  Consider
the FORTRAN statement:
      PI = 3.14           265
... the compiler must inspect the line to the end of the record for additional 
digits.
Variable length records relieve this.  Yet FORTRAN (yet, AFAIK), HLASM, and 
Utility
control files must be FB 80.

And quote-delimited strings aggravate the problem.  FORTRAN had this solved, as 
in:
      WRITE ( 6, 100 )
100 FORMAT( 13HHello, world. / )
... on encouhtering the 13H, the compiler can just MVC 13 bytes to SYSPUNCH 
without
inspecting them.  Subsequently, designers came to believe that silicon is 
cheaper than
carbon, so in C:
        fprintf( stdout, "Hello, world.\n" );
... the compiler, not the programmer can count characters in the string.  Of 
course, the
FORTRAN scheme was easier when the programmer coded on a form with columns 
numbered
and handed it off to Data Entry to be punched and verified.

ISPF addressed the waste of storing blanks with compressed format.  
Characteristically, IBM
introduced this at the wrong implementation layer.  It should have been done at 
the access
method layer or even the control unit so it would be transparent to all 
programs.  Compression
should have been indicated by a flag in the data set label, not by a "magic 
number" in the
data,  which is susceptible to being broken by "[e]rrant/unexpected/unintended 
pieces of binary
data".

There's considerable merit in the UN*X/C scheme of using variable length 
records and using the
TAB character ratner than multiple blanks for column alignment.  It's easier to 
type, it's fewer
characters for the compiler to scan, and it economizes storage.

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to