Writing to follow up on this topic. Committed utf8 limits for data names with an exception for filepaths - applies to the UI and PyAPI, but found some interesting things while working on this.
Python has a number of error callbacks to handle incompatible chars when encoding and decoding. from C/API. - PyUnicode_EncodeUTF8(PyUnicode_AS_UNICODE(u), PyUnicode_GET_SIZE(u), "surrogateescape"); - PyUnicode_DecodeUTF8(str, strlen(str), "surrogateescape"); See http://docs.python.org/py3k/library/codecs.html#codecs.register for more info. This means an invalid unicode char gets converted to something like \u2345 This makes the whole problem seem very simple, use these as fallbacks for _PyUnicode_AsString and PyUnicode_FromString and we ALWAYS get a valid string from ANY C, *char array (tested with random byte arrays). So I converted the rna api to use these and all was fine, you can assign invalid unicode values like this. bpy.context.object.name = "num\udce9ro" # rather then "numéro", but they are the same internally/ This works for getting and setting rna but I wasn't able to use these strings - print, writing to a file would raise errors, So basically the problem is moved to python. The most simple way I could find to print an object named "num\udce9ro" was this... print(somestring.encode("ASCII", "surrogateescape").decode("ASCII", "ignore")) ...even this has unicode chars striped so its not that useful if you want a unique value. So unless I'm missing something It seems this is such a pain to deal with these strings in python that it would be better to use byte arrays. - b"EvertStringHasA_b_prefix". Since if we allow these strings script writers would just ignore this corner case and we'd get bug reports about it every so often. This lead me to the come back to the conclusion to enforce utf8 for all data names. Nevertheless these annoying strings still have to be taken into account with paths, an example of the problem is the OBJ exporter can write to the path but throws an error when trying to print() it or write to a file. The only thing thats left to do is go over the scripts and make sure they work with non-utf8 paths and make sure new ID names derived from paths are stripped. - Campbell On Sat, Aug 14, 2010 at 4:30 AM, Roger Wickes <[email protected]> wrote: > I think that if you save "numéro" in your .blend, it does not matter what the > OS > UTF is; > when you enter it into like the text editor or a field within the Blender UI, > it only matters what the str function in Blender (that is processing that > field) > > does when it is reading that field and saving it. > > I suggest that all string functions in Blender use UTF8 encoding, > and save strings internally as a UTF8 array, > so that the accent is preserved if you enter it as say, a mesh name. > > OS dependency is only relevant when, say, creating a folder or file. For that, > Blender should use OS-defaultencoding as Campbell has said, when dealing with > filenames and the absolutely idiotic slash/backslash conflict we have today. > All > OS encodings will respect your "numéro" as a filename/dirname/username, afaik. > > --Roger > > > > > ----- Original Message ---- > From: Elia Sarti <[email protected]> > To: bf-blender developers <[email protected]> > Sent: Fri, August 13, 2010 6:47:01 AM > Subject: Re: [Bf-committers] Proposal for handling string encoding in blender. > > The point is that different systems use different encodings. UTF-8 is > just one way to encode multibyte characters, UTF-16 is another for > instance (and there are hundreds others). > > Means if you save "numéro" in your .blend on an OS using utf-8 and > someone opens it in one using utf-16 then the string is incompatible. > > I say +1 to this with an addendum. > To some extent encoding can be detected and thus converted, would it be > hard to do so for strings in the .blend? Of course only for a limited > collection, I'd say utf-8 <-> utf-16 would probably suffice as I believe > many linux distros use utf-8 while windows and mac use utf-16, so this > would cover the majority of cases. _______________________________________________ Bf-committers mailing list [email protected] http://lists.blender.org/mailman/listinfo/bf-committers
