Jason Trenouth <[EMAIL PROTECTED]> wrote: Eg compare: some_thing-some_thing_else-a_third_thing versus: SomeThing-SomeThingElse-AThirdThing This sounds like an argument for the Ada Quality & Style Guidelines rule to treat operators as "words" just like identifiers and put spaces around them.
some_thing - some_thing_else - a_third_thing versus: SomeThing - SomeThingElse - AThirdThing which if nothing else makes it clear that operator spacing must be controlled in any study of identifier readability. One of the benefits of 'camel case' in the programming language context is that it tends to squeeze out the possibility that something else might be going on. Two questions here. One is "Where does this CAMEL CASE thing come from?" It used to be called "studly caps", which is why I've added the african gender prefix to make "baStudlyCaps". The other is HOW does baStudlyCaps "tend.. to squeeze out the possibility that something else might be going on"? For example, if we had a programming language (and if there isn't a suitable one, we could always make one up) in which a capital letter always began a new token and, as in APL, user defined prefix and infix operators used identifiers just like variables, SomeThingElse could be how you write (Some) Thing (Else). The reasons that I know for various spelling conventions are historic: Fortran was devised in the days of 6 bit character sets (BCD, BCL, &c) and had only the upper case letters to deal with; it was also the era of 36-bit words and microscopic memories, hence the 6 character limit on names. However, identifiers were allowed to have blanks inside, so "STRT TM" and "FNSH TM" were possible identifiers. COBOL was also devised in that era, and was supposedly intended to be readable by people who were not skilled in programming. (And let's face it, the idea of letting auditors audit accounting code is not a bad one.) COBOL copied the use of hyphens, which it could because the hyphen-minus was in the character set, and could get away with it because the use of minus for subtraction was rare. (I don't know when the COMPUTE verb entered COBOL; I suspect the first COBOL didn't have it, so minus would never have been used for subtraction.) Algol 60 was devised by people who had at least heard of larger character sets and didn't have any standard character set to appeal to, hence the distinction between "publication" format in which you could use anything you like up to and including letters from other scripts than Latin, and what you really used in programming. Algol 60 (and even its successor Algol 68) also allowed blanks inside identifiers, so "start time" and "finish time" were possible identifiers. Using the hyphen-minus as a word separator wasn't an option because the Algols were "ALgebraic" languages where the character was needed for subtraction. Pascal was devised for a CDC machine with a 6 bit character set, so it reverted to upper case only. The character set did not include anything suitable for use as a word separator other than the blank. Experience with Algol 60 and Algol 68 had shown that distinguishing keywords from identifiers could be a problem, which Pascal solved by reserving the keywords and forbidding embedded blanks. It apparently never occurred to Wirth to devise a language in which there weren't any reserved words; with only a 6-bit character set that's not surprising. The first major language I know to use underscores in identifiers was PL/I, which was devised for a 60-character character set that DID include a visually unobtrusive character that could be used as a word separator. Burroughs Algol also allowed it, because the Burroughs machines also used the EBCDIC character set. (Yes, I know EBCDIC was an 8-bit character set. Just because the character set has provision for lower case letters doesn't mean your keypunches or print chains do.) There are two main Lisp traditions. The Interlisp tradition used dots to separate words. The MacLisp tradition uses hyphens to separate words in identifiers, like COBOL, because there are no infix operators to worry about. (Their functions are all available, but not using operator syntax.) I personally find multi-word-identifiers-like-this especially readable, but the point here is that the absence of infix operators removed a constraint and *allowed* them to do this. In contrast, Interlisp sort of did have operators: if you used an identifier that wasn't defined, the Do-What-I-Mean facility would have a crack at it and try, amongst other things, parsing it as an expression. Interlisp didn't use hyphen separators because it couldn't. Java uses baStudlyCaps. That seems to have been copied from common (but not universal) C++ practice. And THAT seems to have been copied from Smalltalk by people who didn't understand why Smalltalk used it. Smalltalk was designed in the 70s by people using, initially, Xerox Altos. If you've ever seen an Alto, or a good picture of an Alto keyboard, you will know that although the Alto's character set included both lower case and upper case letters, it did not include an underscore. In fact the character set was a hybrid of ASCII 67 and ASCII 63 (or whatever the dates were): what is now "_" was left arrow and what is now "^" was up arrow, both of which Smalltalk used (for ":=" and "return" respectively.) So Smalltalk didn't allow underscores inside identifiers because of a technological limitation on the typography: it couldn't, there was no such character. Since Smalltalk syntax allows sequences of words, "self owner boundingBox area" meaning "send #area to the result of sending #boundingBox to the result of sending #owner to myself", allowing spaces inside identifiers wasn't an option, so it had to be baStudlyCaps or justrunonwords. The Smalltalk books never claimed that baStudlyCaps was _better_ than anything, it was all they had. So to me it seems as though baStudlyCaps is just a wave of fashion based on a misunderstanding of a response to a technological limitation as a deliberate preference. But I am often wrong, and if there were any evidence that baStudlyCaps is in some way better it's part of my job to be evidence-based when I can. These days, it would seem possible to mark the distinction between reserved words (including words used as operators) and identifiers typographically, so the reasons against allowing spaces in identifiers would seem to be obsolete. Syntax colouring in IDEs is really the same idea. Speaking of which, has anyone studied that? Are there any results on which colours are better for what kinds of tokens? (Yes, this is still about identifier readability.) ---------------------------------------------------------------------- PPIG Discuss List (discuss@ppig.org) Discuss admin: http://limitlessmail.net/mailman/listinfo/discuss Announce admin: http://limitlessmail.net/mailman/listinfo/announce PPIG Discuss archive: http://www.mail-archive.com/discuss%40ppig.org/