On 2006-05-07 04:51:46 -0500, Peter Samuelson wrote: > [Vincent Lefevre] > > There are also other file formats, such as ogg files, whose meta-data > > are encoded in UTF-8. > > You can't put subversion keywords in vorbis comments anyway; it's not a > text-based format.
You're wrong. It is possible: meta-data are UTF-8 text directly included in the ogg file. Look with "strings" for instance. > The reason I ask is that the user's LC_CTYPE is already used to > determine the encoding of filenames. This is a different matter. This one doesn't hurt very much (it will only break if users with different locales access the working copy) and won't break file formats. Concerning the file contents, the encoding is fixed on the repository side, and Subversion doesn't perform any conversion into the user's locale encoding. > I am showing the link between that and the content of XML documents, > and why you want the two encodings to be the same. I didn't say I wanted these two encodings to be the same. > You're also biased toward UTF-8 content. Yes, but one needs to make a choice, and UTF-8 is the common one. Otherwise, the solution is to do transliteration into US-ASCII (it could even be better). > Files can be in any encoding. Why do you assume that users will > never produce XML files, or indeed random source code, in > ISO-8859-2? In this case they shouldn't use keywords in them, and wait for bug 2332 to be fixed. Using the locales doesn't solve this problem anyway, since different users may use different locales. Also, ISO-8859-2 shouldn't be used in XML for files that are meant to be shared (unless the users agree to use it), because an XML parser isn't required to support ISO-8859-2. UTF-8 is OK. > > Well, in his second sentence, Julian said: "... is better than > > mixed locales." > > He's agreeing with my objection, where I ask what the point is of > localising the language of a date string but not localising the > encoding. Subversion doesn't localize the encoding. Your patch doesn't fix that. Also, one may wonder if the date should be localized at all. IMHO, this is an error to do that globally. For instance, why would you include a French keyword expansion in an English file? The right solution is to improve the keyword mechanism (e.g. bug 890). Your patch is premature. > Are you trying to argue that the encoding is specific to the file > but that the human language is not? The problem is that the file encoding is fixed and doesn't depend on the user's locales. So, this should be the same for keyword expansion in order to avoid mixed locales. > That seems pretty absurd to me. Either localise both (what I think) > or localise neither (what Ivan Zhakov thinks). *Currently* it's much better to localize neither. This won't break anything. > > > Ivan thinks keywords should not be localised at all, which also > > > solves the problem, but that's a lot harder to implement. > > > > No, it doesn't solve the problem. > > Sure it does. No, you'll still have the problems with non-ASCII characters (remember that they can also occur in user names). Here's a summary of the pros and the cons of different solutions before charset information can be stored in properties (as suggested in bug 2332): * Using UTF-8 (current behavior): + Pros: fixed encoding; no loss; compatible with file formats based on UTF-8, which are common (UTF-8 is more or less the default encoding nowadays). + Cons: may be incompatible with some documents. * Using US-ASCII (transliteration): + Pros: fixed encoding; compatible with any encoding (except EBCDIC, but this one is not tractable) and any file format. + Cons: small loss for non-ASCII characters. * Using the encoding specified by the locales: + Pros: compatible with tools that don't understand encodings different from the one specified by the locales. + Cons: all the documents using keywords should have the same encoding; also requires every user of the repository to use the same locales or compatible ones (which may require root access to install them, or may not even be available on some OS's); if externals are used, the corresponding repositories should assume compatible encodings; not backward compatible. -- Vincent Lefèvre <[EMAIL PROTECTED]> - Web: <http://www.vinc17.org/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/> Work: CR INRIA - computer arithmetic / SPACES project at LORIA