levels of abstraction (was: 16-bytes the same)

john gilmore Wed, 06 Oct 2010 05:04:50 -0700

This is one of my ruminations, and if you find them boring you may safely skip 
the rest of this post.
 
The subject of this thread in all of its particularity is not of much interest, 
but it is of great interest if it is viewed more abstractly.  This time, 
moreover, it is easy to generalize.
 
Most of the posts dealt with blanks, x'40' in EBCDIC and x'20' in  ASCII, but 
repetitions of nuls, x'00' in both, are more important in some few contexts.  
 
The limitation to 16 characters,  one instance and 15 repetitions of it, is 
artificial.  
 
Finally, the limitation to a single character is stultifying: many apparently 
different but in fact conceptually identical problems require that repetitions 
of any of a subset of characters be identified.
 
Consider the two PL/I statements, useful because they are cointext-sensitive:
 
declare transac3 file record sequential buffered ;
 
open file(transac3) input      ;
 
In  both we want to break out the tokens, 'declare', 'transac3', 'file', 
'record', 'sequential', 'buffered', ';' in the first and 'open', 'file', '(', 
'transac3', ')', 'input', ';' in the second.  Here, as Robin has already 
pointed out,  a TRT is better than a CLC[L].  
 
Indeed, well-written texical breakout routines consist of little more than a 
small finite-state-machine and a set of TRT tables.  In particular, they do not 
process inputs one character at a time (as the computer science 101 
illustrations always do).
 
There is a sense in which this is well known.  The current PROP in its 
discussion of the TRT instruction says
 
TRANSLATE AND TEST may be used to scan the first operand for characters with 
special meaning. The second operand, or list, is set up with all-zero function 
bytes for those characters to be skipped over and with nonzero function bytes 
for the characters to be detected.
 
For my example it is thus possible to both stop on [any of] a blank, a left 
parenthesis, a right parenthesis, or a semicolon and to distinguish them 
(without further testing) after stopping.
 
Why use a TRT, which requires a table of, usually, 256 bytes, when something 
more compact can be put together for the special case of 16 blanks (or nuls)?  
The answer can be boiled down to a single word,  reusability.   
 
Efficiency is a vexed question, and I cannot consider it here in any really 
satisfactory way.  No one wants to write inefficient or inelegant code, but 
preoccupation with the relative efficiencies of alternative single 
instructions, both of which consume only nanoseconds, is a mug's game.  Worse, 
the avoidance of single millicode-based instructions, their replacement by a 
notionally and temporarily more efficient sequence of hardware-based ones, is, 
I think, perverse.


Enough!  I have already offended just about everyone.
 
John Gilmore Ashland, MA 01721-1817 USA

levels of abstraction (was: 16-bytes the same)

Reply via email to