The rules are more like the following: * Underscores and em-dash are ignored except that '_' is a "don't care" identifier. * First char is CS, others in the range A-Za-z are CI. * Underscores and em-dash are _separators_. * Backticks can be used to construct other identifiers where everything in the backticks has to be a valid token. Whitespace between the tokens is ignored.
That's 4 rules and the backtick rules are mostly irrelevant in practice. For example, in Java you can either write π or `\u03C0`. Does that mean I need to worry all the time about my hypothetical Java code becoming unreadable anytime soon? Hardly.
