@Araq: Simply by trying things out and attempting to make out some patterns in the observed behaviour. And having a peek now and then at the resulting C source.
As you can see, I gave up at rule twelve to be more specific. It became too cluttered when I experimented with exclamation marks and dollars, especially in combination with underscores and en-dashes. So it would be better indeed to build up the rules from the source, in my opinion. On the other hand: tests have the last word.
