2012/11/20 Richard Hipp <[email protected]>: > I think that there needs to be two separate versions of > fossil_utf8_to_unicode(), or perhaps a parameter on one function, so that > some strings can get the conversion of problematic characters to the unicode > private-use range, while others go unaltered.
Done now in [e6a1910fa8]. Only conversions involving filenames handle the problematic characters, all other Unicode<->utf-8 conversions do not. I think it's ready to be merged to trunk. See below for the tests I did. > Some uses are less clear. For example, at > http://www.fossil-scm.org/fossil/artifact/33d79f5b0a2?ln=850 we are > converting an entire command-line, which does likely contain some filenames, > but also might contain other text where *?"<>| are legitimate characters. Thinking more on it: fossil_system() should not do this special conversion: It only translates from utf-8 to unicode, while the started fossil instance will just do the conversion the other way around. The two conversions just need to be symmetrical, which means that fossil_system() must not do any special magic and surely not make any assumption on how the command line looks like. Stefan, could you try out [e6a1910fa8]? I did the following tests with it, and all work fine: - Compiled [e6a1910fa8] on Linux (Ubuntu 12.4, 64-bit), cygwin and win32 (with mingw-w64)) Created a repository with 'strange' filenames containing characters like ':', '<', '[', '"' and even newline. - With fossil [e6a1910fa8] everything works fine: I can checkout all files from the repository just fine on all platforms. On Cygwin/Windows, the correct filename translation is done: It looks strange from Windows Explorer, but from the cygwin shell everything looks as expected. - With fossil 1.24, the files from the 'offending' commit cannot be checked-out (we already knew that), but also the commit which contained the 'invalid' filenames is simply not visible at all! That's nice (something I didn't expect)! So, after this change is on trunk, it operates well with older fossil versions: 'offending' commits are completely invisible for older-fossil users on the timeline, even on the "branches" page. They wouldn't even know the commit is there, until they upgrade fossil and do a "fossil all rebuild". I'm impressed! I cannot think of any reason any more why [e6a1910fa8] could be bad for fossil. All my tests have good results. I don't see any security risk doing this. I couldn't make fossil crash on this, even not with older fossil versions. Richard, what more needs to be done to get this on trunk, so all discussions on what characters should be valid can finally mute.... The new (proposed) rule is then: The only forbidden character in filenames is the backslash (and NUL and '/', of course). If a platform has more invalid characters in filenames, fossil translates that to another (unicode) character which is safe. For Windows, the cygwin-algorithm is used for this, so fossil works flawlessly together with the cygwin shell. Feedback welcome! Regards, Jan Nijtmans _______________________________________________ fossil-users mailing list [email protected] http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

