Re: PPIG discuss: Word separation and identifier readability

Richard A. O'Keefe Mon, 04 Sep 2006 16:04:14 -0700

Jason Trenouth <[EMAIL PROTECTED]> wrote:
        Eg compare:
        
            some_thing-some_thing_else-a_third_thing
        
        versus:
        
            SomeThing-SomeThingElse-AThirdThing
        
This sounds like an argument for the Ada Quality & Style Guidelines
rule to treat operators as "words" just like identifiers and put
spaces around them.


            some_thing - some_thing_else - a_third_thing

        versus:
        
            SomeThing - SomeThingElse - AThirdThing
        
which if nothing else makes it clear that operator spacing must be
controlled in any study of identifier readability.

        One of the benefits of 'camel case' in the programming language
        context is that it tends to squeeze out the possibility that
        something else might be going on.

Two questions here.
One is "Where does this CAMEL CASE thing come from?"  It used to be
called "studly caps", which is why I've added the african gender
prefix to make "baStudlyCaps".

The other is HOW does baStudlyCaps "tend.. to squeeze out the
possibility that something else might be going on"?  For example,
if we had a programming language (and if there isn't a suitable one,
we could always make one up) in which a capital letter always began
a new token and, as in APL, user defined prefix and infix operators
used identifiers just like variables, SomeThingElse could be how
you write (Some) Thing (Else).

The reasons that I know for various spelling conventions are historic:

    Fortran was devised in the days of 6 bit character sets (BCD, BCL, &c)
    and had only the upper case letters to deal with; it was also the era
    of 36-bit words and microscopic memories, hence the 6 character limit
    on names.  However, identifiers were allowed to have blanks inside, so
    "STRT TM" and "FNSH TM" were possible identifiers.

    COBOL was also devised in that era, and was supposedly intended to be
    readable by people who were not skilled in programming.  (And let's
    face it, the idea of letting auditors audit accounting code is not a
    bad one.)  COBOL copied the use of hyphens, which it could because the
    hyphen-minus was in the character set, and could get away with it because
    the use of minus for subtraction was rare.  (I don't know when the
    COMPUTE verb entered COBOL; I suspect the first COBOL didn't have it,
    so minus would never have been used for subtraction.)

    Algol 60 was devised by people who had at least heard of larger character
    sets and didn't have any standard character set to appeal to, hence
    the distinction between "publication" format in which you could use
    anything you like up to and including letters from other scripts than
    Latin, and what you really used in programming.  Algol 60 (and even its
    successor Algol 68) also allowed blanks inside identifiers, so
    "start time" and "finish time" were possible identifiers.  Using the
    hyphen-minus as a word separator wasn't an option because the Algols
    were "ALgebraic" languages where the character was needed for subtraction.

    Pascal was devised for a CDC machine with a 6 bit character set, so it
    reverted to upper case only.  The character set did not include anything
    suitable for use as a word separator other than the blank.  Experience
    with Algol 60 and Algol 68 had shown that distinguishing keywords from
    identifiers could be a problem, which Pascal solved by reserving the
    keywords and forbidding embedded blanks.  It apparently never occurred
    to Wirth to devise a language in which there weren't any reserved words;
    with only a 6-bit character set that's not surprising.

    The first major language I know to use underscores in identifiers was
    PL/I, which was devised for a 60-character character set that DID
    include a visually unobtrusive character that could be used as a word
    separator.  Burroughs Algol also allowed it, because the Burroughs
    machines also used the EBCDIC character set.  (Yes, I know EBCDIC was
    an 8-bit character set.  Just because the character set has provision
    for lower case letters doesn't mean your keypunches or print chains do.)

    There are two main Lisp traditions.  The Interlisp tradition used dots
    to separate words.  The MacLisp tradition uses hyphens to separate
    words in identifiers, like COBOL, because there are no infix operators
    to worry about.  (Their functions are all available, but not using
    operator syntax.)  I personally find multi-word-identifiers-like-this
    especially readable, but the point here is that the absence of infix
    operators removed a constraint and *allowed* them to do this.  In
    contrast, Interlisp sort of did have operators:  if you used an
    identifier that wasn't defined, the Do-What-I-Mean facility would have
    a crack at it and try, amongst other things, parsing it as an expression.
    Interlisp didn't use hyphen separators because it couldn't.

    Java uses baStudlyCaps.  That seems to have been copied from common
    (but not universal) C++ practice.  And THAT seems to have been copied
    from Smalltalk by people who didn't understand why Smalltalk used it.

    Smalltalk was designed in the 70s by people using, initially, Xerox Altos.
    If you've ever seen an Alto, or a good picture of an Alto keyboard, you
    will know that although the Alto's character set included both lower case
    and upper case letters, it did not include an underscore.  In fact the
    character set was a hybrid of ASCII 67 and ASCII 63 (or whatever the
    dates were):  what is now "_" was left arrow and what is now "^" was
    up arrow, both of which Smalltalk used (for ":=" and "return" respectively.)
    So Smalltalk didn't allow underscores inside identifiers because of a
    technological limitation on the typography:  it couldn't, there was no
    such character.  Since Smalltalk syntax allows sequences of words,
    "self owner boundingBox area" meaning "send #area to the result of
    sending #boundingBox to the result of sending #owner to myself",
    allowing spaces inside identifiers wasn't an option, so it had to be
    baStudlyCaps or justrunonwords.  The Smalltalk books never claimed that
    baStudlyCaps was _better_ than anything, it was all they had.

So to me it seems as though baStudlyCaps is just a wave of fashion
based on a misunderstanding of a response to a technological limitation
as a deliberate preference.  But I am often wrong, and if there were any
evidence that baStudlyCaps is in some way better it's part of my job to
be evidence-based when I can.

These days, it would seem possible to mark the distinction between
reserved words (including words used as operators) and identifiers
typographically, so the reasons against allowing spaces in identifiers
would seem to be obsolete.  Syntax colouring in IDEs is really the same
idea.  Speaking of which, has anyone studied that?  Are there any results
on which colours are better for what kinds of tokens?  (Yes, this is still
about identifier readability.)

 
----------------------------------------------------------------------
PPIG Discuss List (discuss@ppig.org)
Discuss admin: http://limitlessmail.net/mailman/listinfo/discuss
Announce admin: http://limitlessmail.net/mailman/listinfo/announce
PPIG Discuss archive: http://www.mail-archive.com/discuss%40ppig.org/

Re: PPIG discuss: Word separation and identifier readability

Reply via email to