Danny Trebbien wrote on Fri, Jan 07, 2011 at 13:03:18 -0800: > > Refresh my memory please: did we decide in some thread that the > > recodings should be counted too? > > At one point Stefan asked me if I was interested: > http://article.gmane.org/gmane.comp.version-control.subversion.devel/122540 >
Okay. I no longer remember what was decided (too many threads about this), but I'll respond to your points below. > >> * subversion/tests/cmdline/svnsync_tests_data/copy-bad-line-endings2.dump > >> A dump of a repository with the following features: > >> 1. The log message (`svn:log` revision property) of revision 1 has CRLF > >> line > >> endings. > >> 2. The log message of revision 2 has CR line endings. > >> 3. Revision 3 introduces an `svn:ignore` node property with CRLF line > >> endings. > >> 4. Revision 4 introduces a custom node property, `x:related-to`, with > >> CRLF > >> line endings. > >> > > > > I don't understand > > > > * What is the difference between copy_bad_line_endings() and > > copy_bad_line_endings2()? > > The difference is that copy_bad_line_endings2 tests that non-LF line > endings in a non-translatable property (one that does not begin with > `svn:`) are not affected. Okay. (I overlooked that detail.) > I simply ported one of the three test > repositories that I posted to the Developers' mailing list a while ago > to the cmdline tests architecture. > > > Why don't you call svn_utf_cstring_to_utf8(), like is done in almost > > every other use of opt_arg? > > Well, here's my reasoning for not reencoding into UTF-8: as opposed to > strings such as the username and password, which must be represented > independent of the local machine (so, reencoded into a standard > character encoding), the representation of the value of the "source > encoding" option is tied to the local machine's charset conversion > faculties. Let's say that it's iconv for simplification. The charset > of the values of the "to encoding" and "from encoding" parameters are > system-dependent, so we don't know what charset they are encoded in. > If somehow someone managed to install a charset with an unusual name > (not representable in ASCII) and the system iconv implementation > expects ISO-8859-1 strings for the to/from encoding values, then that > someone might not be able to pick the special charset if we reencode > the to/from encoding values into UTF-8. Presumably however, the user > would know how to pass the ISO-8859-1 representation of the name of > that charset to the program as one of the elements of ARGV, so pass it > to iconv unmodified. > Can you name a charset for which this is not a theoretical concern? To put it succintly, it seems to me you're just worrying too much here. Also: consistency is good. If you're going to special-case one --option's parsing from all others, you need a VERY good reason. > >> + *str = new_str; > >> + } > >> + > >> + /* Only detect inconsistent line ending style. This is accomplished by > >> simply > >> + looking for carriage return (\r) characters. */ > >> + else if (memchr((*str)->data, (unsigned char) '\r', (*str)->len) != > >> NULL) > > > > I see the reason for the s/strchr/memchr/ in the log message: because > > the length is known. I suppose the underlying motivation is to make the > > condition slightly faster? > > And to support NUL characters ('\0'). > I'm not sure that NUL is actually allowed in these propvalues, but I don't feel strongly about this. > >> + { > >> /* Found some. Normalize. */ > >> const char* cstring = NULL; > >> SVN_ERR(svn_subst_translate_cstring2((*str)->data, &cstring, > >> "\n", TRUE, > >> NULL, FALSE, > >> - pool)); > > > > Should this call be to svn_subst_translate_string2()? > > ^^ > (I just noticed that translate_string2() and translate_cstring2() aren't actually related; one is for keywords and one not. Unfortunate naming.) > I didn't call svn_subst_translate_string2() because this is the else > condition for "source encoding is not NULL". In the documentation for > the function, I added that if source encoding is not NULL, then no > reencoding is performed because the string is presumed to be encoded > in UTF-8. > > Perhaps, though, I should use svn_subst_translate_string2() with the > encoding set to "UTF-8". It would greatly simplify normalize_string(), +1. Just call it with encoding=NULL; using the same API in both branches makes life easier for the next person to revv that API. > but might do more work if converting UTF-8 to UTF-8 is not a no-op. Do you know about the UTF-8 multiple-normalization-forms issues? Because if not, you're just worrying too much again :-) > I'm not sure if svn_utf_cstring_to_utf8_ex2() recognizes this special > case. > > > Most svnsync tests are used by svnrdump too. Could you check if > > a similar test should be added to svnrdump_tests.py? > > Sure. I will look. Thanks. Daniel