On Tue, Dec 13, 2011 at 4:25 PM, Abel Deuring <abel.deur...@canonical.com> wrote: > On 13.12.2011 10:19, Abel Deuring wrote: >> On 13.12.2011 07:26, Robert Collins wrote: >> >>> We could say 'utf8' and leave it at that. Or we could say 'the >>> printable subset of ascii' or some such. I'd just say non whitespace >>> utf8, as strings are easier to deal with, and avoiding whitespace >>> avoids most likely encoding issues. >> >> No arbitrary utf8/utf16 or anything non-ASCII please, or we may have >> funny things like the attached script.
Agree. There are too many issues with utf8 and if anything actually starts making use of that space things will explode. In particular, languages will handle normalization differently so you will need to end up dealing with byte strings in any case and the identifiers may not round trip - you would need to encode the string to ascii before using it as a key in a database or a filename or pasting it into a document for instance if you expect the byte string you put in to match the byte string you get back out. They also will need to be human consumable, so I'd restrict each chunk to being [a-zA-Z0-9]+ delimited by a well known token, with an explicit instance. "service-instance-type-id" (delimiter '-' chosen as most systems will consider the whole thing 'a word' such as double clicking on it in Gnome terminal). I'm also tempted to say case insensitive or lowercase only. -- Stuart Bishop <stuart.bis...@canonical.com> _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp