On 12/28/2010 01:58 PM, Charles Oliver Nutter wrote: > On Tue, Dec 28, 2010 at 12:21 PM, Per Bothner<p...@bothner.com> wrote: >> Is there a plan/consensus for how to handle "illegal" characters >> in identifiers? I'm primarily interested in the bytecode level, >> not the Java source level. For example identifiers like '/' >> used for division in Scheme. It would be good to have a standard >> way to deal with this. > > See John Rose's post on this here: > http://blogs.sun.com/jrose/entry/symbolic_freedom_in_the_vm > > We have implemented it in JRuby, and it works well. The down side is > that Java backtraces can be a little hard to read when there's lots of > symbolic identifiers.
A problem with this mangling is that it isn't "safe" for class names, or at least not for class files. Using '\' in a filename is obviously problematical, especially on Windows. On Posix-based file system the funny characters are in principle allowed, but will of course be awkward to access from shells and other tools. Windows disallows the following in file names: < (less than) > (greater than) : (colon) " (double quote) / (forward slash) \ (backslash) | (vertical bar or pipe) ? (question mark) * (asterisk) http://msdn.microsoft.com/en-us/library/aa365247(v=vs.85).aspx (And of course we have problems with-insensitive file systems.) Now of course we can use an annotation to specify the source class name in case the source class name is invalid - but then we still need to mangle the class name somehow. I think a better prefix character would be '%'. It's not reserved for Posix or Windows or JVM, while not being a valid Java character. Even better might be '~' or '!' since those are also unreserved for URIs. I will assume '~' in the following. If we want names that a "safe for filenames" or even "safe for URIs" then the problem is that there are too many unsafe characters to encode as '~' followed a safe non-alphanumeric. Which means that we need to use '`' followed by a *letter*. For example: '/' -> '~s' (mnemonic: slash) '.' -> '~d' (dot) '<' => '~l' (less) etc etc What about non-Ascii characters? I don't know enough to know if such characters might cause a problem, but don't know of any reason. They might technically be disallowed by URIs, but my impression %-mangling is handled somewhat universally and semi-transparently. -- --Per Bothner p...@bothner.com http://per.bothner.com/ _______________________________________________ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev