/usr/games/bcd formats its input as punched cards. It has an internal
table that maps each byte value to a 12-bit code, representing a pattern
of holes in one column of the card. The table is missing an element at
index 0x5c (ASCII '\\'), which causes the succeeding elements to be
misaligned. You can see it by looking at the rows that correspond to
uppercase and lowercase letters; punch card codes don't distinguish case
so uppercase and lowercase are supposed to map to the same code. But the
lowercase codes are shifted by one position:
        index 0x40 "@ABCD..." -> 0x022, 0x900, 0x880, 0x840, 0x820, ...
        index 0x60 "`abcd..." -> 0x900, 0x880, 0x840, 0x820, 0x810, ...
Now, in the specific case of lowercase letters, the error doesn't
matter, because the program calls toupper before doing the table lookup.
But the codes for 0x5c '\\', 0x5d ']', 0x5e '^', and 0x5f '_' are wrong,
compared to the table in an early version of bcd and a reference of
punch card codes I found. And indexes 0x60 '`' and 0x61 'a' map to the
same code, 0x900, because 'a' is affected by toupper and '`' is not.

The evident intention of the table is that the second half should be the
same as the first half; i.e., the most significant bit of the index does
not matter. But the missing element messes that up too.

Finally, the table has the code 0x30f at indexes 0x5f '_' and 0xdf (or
indexes 0x60 '`' and 0xe0 after correcting the misalignment). I suspect
the code 0x30f is nonsense--garbage data resulting from a buffer
overflow in an even older version of bcd.c. What code to use for it is
debatable, as '`' doesn't seem to have been used on punch cards, but
following historical example we can assign it 0x022, the same as index
0x40 '@'.

The included patch fixes the above problems. It modifies the table thus:
        Insert 0x806 at index 0x5c '\\', shifting later elements by one
        Change index 0x60 '`' from 0x30f to 0x022
        Change index 0xe0     from 0x30f to 0x022
        Trim the now-extraneous final element 0x000


Additional/background information:

Why code 0x806 for index 0x5c '\\'? Why code 0x022 for indexes 0x60 '`'
and 0xe0? I based these decisions on historical versions of bcd.c:

1975: V6 Unix
        
https://github.com/dspinellis/unix-history-repo/blob/c2f18e5b8594a88a07258339e61d6e61bee1ae7d/usr/source/s1/bcd.c
1989: 4.3BSD-Reno, black-box rewrite, introduces the misalignment error
        
https://github.com/dspinellis/unix-history-repo/commit/45c04b1797c40ca227563c4ef331df5e929f89e9
1993: fixes duplicate codes for 'Q' and 'R'
        
https://github.com/dspinellis/unix-history-repo/commit/19432bc25b8c2443bc12d4a04ef98b525e837d19

The bit order in the 1975 table is reversed compared to the current
source code. If we flip it around, then:
        0x082, /* [ */
        0x806, /* \ */
        0x822, /* ] */
        0x600, /* ^ */
        0x282, /* _ */
So 0x806 was intended for '\\', and inserting the code at that index
also conveniently makes ']', '^', and '_' line up.

The 1975 program has only 64 elements in its lookup table. It shifts
part of the ASCII space to overlap the table, then does a does a poor
man's toupper, subtracting 32 from characters 'a' and higher:
        c = *spp++ - 040;
        if (c>='a'-040) c = c - 040;
This operation affects not only lowercase letters but also later
non-letter characters. It therefore assigns the same code to 0x5b '['
and 0x7b '{', to 0x5c '\\' and 0x7c '|', etc. Whether or not that's
intentional, and even though '`' precedes 'a' and is therefore
unaffected by the subtraction, it's consistent to assign to index
0x60 '`' the same code assigned to index 0x40 '@', 0x022.

The 1975 code does not actually assign *any* meaningful code to index
0x60 '`', because of a bug. When *spp is 0x60, c will be 0x40; i.e., 64,
one element past the end of the lookup table. I suspect the code
0x30f--which has too many bits set to be a punch card code--is actually
just garbage data from the end of the array. This online V6 Unix
emulator yields code 0x7ff for '`': http://pdp11.aiju.de/. Try:
        @unix
        # STTY -LCASE
        # bcd "`"

The 1989 version was derived from some earlier version in a black-box
manner: a shell script ran the old version for each byte value and
recorded the output. That may explain why the missing element is at
index 0x5c '\\': the '\\' character's special meaning may have caused a
bug in the script. That's just my guess, though. Thanks to Steve Hayman
for filling in some of the history on this.

After applying the included patch, the current codes match the 1975
codes in the first 128 indexes, except for the corrected overflow at
index 0x60 '`', and the indexes 0x52 'R' and 0x72 'r' that were
corrected in the 1993 version.

I don't know what encoding bcd is actually supposed to follow. I found
a page with many incompatible encodings:
http://homepage.divms.uiowa.edu/~jones/cards/codes.html. Of these, the
"General Electric" code comes closest. Before the patch, the bcd differs
in five characters: '+', '\\', ']', '^', and '_'. After the patch, it
differs in only two : '+' and '^'. The 1975 code had the same
differences in the '+' and '^' characters.

o = holes in bcd only
+ = holes in General Electric only
. = holes in both
 ________________________________________________________________
/ &-0123456789ABCDEFGHIJKLMNOPQR/STUVWXYZ[#@:>?+.](<\^$*);'_,%="!
| .           .........                        ......            
|  .                   .........                     ......      
|   .                           .........      x     x     ......
|    .        .        .        .                                
|     .        .        .        .       .     +     +     .     
|      .        .        .        .       .     .     .     .    
|       .        .        .        .       .     .     .     .   
|        .        .        .        .       .     .     .     .  
|         .        .        .        .       .     .     .     . 
|          .        .        .        .       .     .     .     .
|           .        .        .        . ......+.....+...........
|            .        .        .        .                        
|________________________________________________________________




Index: bcd.c
===================================================================
RCS file: /cvs/src/games/bcd/bcd.c,v
retrieving revision 1.25
diff -u -p -u -p -r1.25 bcd.c
--- bcd.c       7 Mar 2016 12:07:55 -0000       1.25
+++ bcd.c       21 Jan 2018 06:10:51 -0000
@@ -81,27 +81,27 @@ u_short holes[256] = {
     0x022,      0x900,   0x880,   0x840,   0x820,   0x810,   0x808,   0x804,
     0x802,      0x801,   0x500,   0x480,   0x440,   0x420,   0x410,   0x408,
     0x404,      0x402,   0x401,   0x280,   0x240,   0x220,   0x210,   0x208,
-    0x204,      0x202,   0x201,   0x082,   0x822,   0x600,   0x282,   0x30f,
-    0x900,      0x880,   0x840,   0x820,   0x810,   0x808,   0x804,   0x802,
-    0x801,      0x500,   0x480,   0x440,   0x420,   0x410,   0x408,   0x404,
-    0x402,      0x401,   0x280,   0x240,   0x220,   0x210,   0x208,   0x204,
-    0x202,      0x201,   0x082,   0x806,   0x822,   0x600,   0x282,   0x0,
+    0x204,      0x202,   0x201,   0x082,   0x806,   0x822,   0x600,   0x282,
+    0x022,      0x900,   0x880,   0x840,   0x820,   0x810,   0x808,   0x804,
+    0x802,      0x801,   0x500,   0x480,   0x440,   0x420,   0x410,   0x408,
+    0x404,      0x402,   0x401,   0x280,   0x240,   0x220,   0x210,   0x208,
+    0x204,      0x202,   0x201,   0x082,   0x806,   0x822,   0x600,   0x282,
     0x0,        0x0,     0x0,     0x0,     0x0,     0x0,     0x0,     0x0,
     0x0,        0x0,     0x0,     0x0,     0x0,     0x0,     0x0,     0x0,
     0x0,        0x0,     0x0,     0x0,     0x0,     0x0,     0x0,     0x0,
     0x0,        0x0,     0x0,     0x0,     0x0,     0x0,     0x0,     0x0,
-    0x206,      0x20a,   0x042,   0x442,   0x222,   0x800,   0x406,   0x812,
-    0x412,      0x422,   0xa00,   0x242,   0x400,   0x842,   0x300,   0x200,
-    0x100,      0x080,   0x040,   0x020,   0x010,   0x008,   0x004,   0x002,
-    0x001,      0x012,   0x40a,   0x80a,   0x212,   0x00a,   0x006,   0x022,
-    0x900,      0x880,   0x840,   0x820,   0x810,   0x808,   0x804,   0x802,
-    0x801,      0x500,   0x480,   0x440,   0x420,   0x410,   0x408,   0x404,
-    0x402,      0x401,   0x280,   0x240,   0x220,   0x210,   0x208,   0x204,
-    0x202,      0x201,   0x082,   0x806,   0x822,   0x600,   0x282,   0x30f,
-    0x900,      0x880,   0x840,   0x820,   0x810,   0x808,   0x804,   0x802,
-    0x801,      0x500,   0x480,   0x440,   0x420,   0x410,   0x408,   0x404,
-    0x402,      0x401,   0x280,   0x240,   0x220,   0x210,   0x208,   0x204,
-    0x202,      0x201,   0x082,   0x806,   0x822,   0x600,   0x282,   0x0
+    0x0,        0x206,   0x20a,   0x042,   0x442,   0x222,   0x800,   0x406,
+    0x812,      0x412,   0x422,   0xa00,   0x242,   0x400,   0x842,   0x300,
+    0x200,      0x100,   0x080,   0x040,   0x020,   0x010,   0x008,   0x004,
+    0x002,      0x001,   0x012,   0x40a,   0x80a,   0x212,   0x00a,   0x006,
+    0x022,      0x900,   0x880,   0x840,   0x820,   0x810,   0x808,   0x804,
+    0x802,      0x801,   0x500,   0x480,   0x440,   0x420,   0x410,   0x408,
+    0x404,      0x402,   0x401,   0x280,   0x240,   0x220,   0x210,   0x208,
+    0x204,      0x202,   0x201,   0x082,   0x806,   0x822,   0x600,   0x282,
+    0x022,      0x900,   0x880,   0x840,   0x820,   0x810,   0x808,   0x804,
+    0x802,      0x801,   0x500,   0x480,   0x440,   0x420,   0x410,   0x408,
+    0x404,      0x402,   0x401,   0x280,   0x240,   0x220,   0x210,   0x208,
+    0x204,      0x202,   0x201,   0x082,   0x806,   0x822,   0x600,   0x282,
 };
 
 /*

Reply via email to