On 12/31/2010 2:25 PM, Per Bothner wrote: > On 12/28/2010 01:58 PM, Charles Oliver Nutter wrote: >> On Tue, Dec 28, 2010 at 12:21 PM, Per Bothner<p...@bothner.com> wrote: >>> Is there a plan/consensus for how to handle "illegal" characters >>> in identifiers? I'm primarily interested in the bytecode level, >>> not the Java source level. For example identifiers like '/' >>> used for division in Scheme. It would be good to have a standard >>> way to deal with this. >> See John Rose's post on this here: >> http://blogs.sun.com/jrose/entry/symbolic_freedom_in_the_vm >> >> We have implemented it in JRuby, and it works well. The down side is >> that Java backtraces can be a little hard to read when there's lots of >> symbolic identifiers. > A problem with this mangling is that it isn't "safe" for class names, > or at least not for class files. Using '\' in a filename is obviously > problematical, especially on Windows. On Posix-based file system the > funny characters are in principle allowed, but will of course be awkward > to access from shells and other tools. > > Windows disallows the following in file names: > < (less than) >> (greater than) > : (colon) > " (double quote) > / (forward slash) > \ (backslash) > | (vertical bar or pipe) > ? (question mark) > * (asterisk) > http://msdn.microsoft.com/en-us/library/aa365247(v=vs.85).aspx > (And of course we have problems with-insensitive file systems.) > > Now of course we can use an annotation to specify the source class name > in case the source class name is invalid - but then we still need to > mangle the class name somehow. > > I think a better prefix character would be '%'. It's not reserved > for Posix or Windows or JVM, while not being a valid Java character. > Even better might be '~' or '!' since those are also unreserved for URIs. > I will assume '~' in the following. > > If we want names that a "safe for filenames" or even "safe for URIs" > then the problem is that there are too many unsafe characters to > encode as '~' followed a safe non-alphanumeric. Which means that > we need to use '`' followed by a *letter*. > > For example: > '/' -> '~s' (mnemonic: slash) > '.' -> '~d' (dot) > '<' => '~l' (less) > etc etc > > What about non-Ascii characters? I don't know enough to know if > such characters might cause a problem, but don't know of any reason. > They might technically be disallowed by URIs, but my impression > %-mangling is handled somewhat universally and semi-transparently.
just my quick comment... in my VM, I ended up using a variation on JNI name-mangling for pretty much anything needing mangling (including filenames...). however, I did add a few additional escapes (for a few other common characters), and ended up adding a _9xx escape in addition to the _0xxxx escape. list of other escapes: '_' with '_1'; ';' with '_2'; '[' with '_3'; '(' with '_4'; ')' with '_5'; '/' with '_6'. as well, '__' was used as a string-break (mostly when encoding a list of strings as a single token). so, little says similar couldn't be used in the class filenames if needed as well... dunno if this helps for anything... _______________________________________________ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev