2012/11/20 Richard Hipp <[email protected]>:
> I think that there needs to be two separate versions of
> fossil_utf8_to_unicode(), or perhaps a parameter on one function, so that
> some strings can get the conversion of problematic characters to the unicode
> private-use range, while others go unaltered.

Done now in [e6a1910fa8]. Only conversions involving filenames handle the
problematic characters, all other Unicode<->utf-8 conversions do not.
I think it's ready to be merged to trunk. See below for the tests I did.

> Some uses are less clear.  For example, at
> http://www.fossil-scm.org/fossil/artifact/33d79f5b0a2?ln=850 we are
> converting an entire command-line, which does likely contain some filenames,
> but also might contain other text where *?"<>| are legitimate characters.

Thinking more on it: fossil_system() should not do this special conversion:
It only translates from utf-8 to unicode, while the started fossil instance
will just do the conversion the other way around. The two conversions
just need to be symmetrical, which means that fossil_system() must
not do any special magic and surely not make any assumption on
how the command line looks like.

Stefan, could you try out [e6a1910fa8]? I did the following tests with
it, and all work fine:
- Compiled  [e6a1910fa8] on Linux (Ubuntu 12.4, 64-bit), cygwin and
win32 (with mingw-w64)) Created a repository with 'strange' filenames
containing characters like ':', '<', '[', '"' and even newline.
- With fossil  [e6a1910fa8] everything works fine: I can checkout all
  files from the repository just fine on all platforms. On Cygwin/Windows,
  the correct filename translation is done: It looks strange from Windows
  Explorer, but from the cygwin shell everything looks as expected.
- With fossil 1.24, the files from the 'offending' commit cannot be
  checked-out (we already knew that), but also the commit which
  contained the 'invalid' filenames is simply not visible at all!
  That's nice (something I didn't expect)! So, after this change is on
  trunk, it operates well with older fossil versions: 'offending' commits
  are completely invisible for older-fossil users on the timeline, even on
  the "branches" page. They wouldn't even know the commit is there,
  until they upgrade fossil and do a "fossil all rebuild". I'm impressed!

I cannot think of any reason any more why [e6a1910fa8] could
be bad for fossil. All my tests have good results. I don't see
any security risk doing this. I couldn't make fossil crash on
this, even not with older fossil versions.

Richard, what more needs to be done to get this on trunk,
so all discussions on what characters should be valid
can finally mute....  The new (proposed) rule is then: The
only forbidden character in filenames is the backslash
(and NUL and  '/', of course). If a platform has more invalid
characters in filenames, fossil translates that to another
(unicode) character which is safe.
For Windows, the cygwin-algorithm is used for this,
so fossil works flawlessly together with the cygwin shell.

Feedback welcome!

Regards,
         Jan Nijtmans
_______________________________________________
fossil-users mailing list
[email protected]
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to