Branko Čibej wrote on Tue, Mar 15, 2016 at 06:13:28 +0100: > On 15.03.2016 01:08, Daniel Shahaf wrote: > > [email protected] wrote on Fri, Feb 19, 2016 at 22:11:11 -0000: > >> +/* Return TRUE if STR matches PATTERN. Else, return FALSE. Assumes that > >> + * PATTERN is a UTF-8 string normalized to form C with case folding > >> + * applied. Use BUF for temporary allocations. */ > >> +static svn_boolean_t > >> +match(const char *pattern, const char *str, svn_membuf_t *buf) > >> +{ > >> + svn_error_t *err; > >> + > >> + err = svn_utf__normalize(&str, str, strlen(str), TRUE /* casefold */, > >> buf); > >> + if (err) > >> + { > >> + /* Can't match invalid data. */ > >> + svn_error_clear(err); > >> + return FALSE; > >> + } > >> + > >> + return apr_fnmatch(pattern, str, 0) == APR_SUCCESS; > > Should there be a command-line flag to disable casefolding? > > > > E.g., to allow users to grep for identifiers (function/variable/file > > names) using their exact case? Do people who use 'log --search' need it > > to be case-sensitive? (I don't use 'log --search' often.) > > I'd prefer to keep things simple. >
Fair enough. I was concerned that users might perceive removing case-sensitive search as a regression. (I don't like having new knobs any more than you do.) > And as I recall, this whole discussion began because apr_fnmatch > doesn't like non-ASCII characters? s/non-ASCII/multibyte/, but yes. > > Even if casefolding is disabled, we should still apply Unicode > > normalization to form C. > > There's no particular reason it has to be form C, as long as both the > pattern and the string are normalized to the same form. Agreed, that's what I meant to say. > Using form D is possibly even a bit faster, since that's the internal > 32-bit representation used by utf8proc. It's a pity we don't have > a 32-bit-char fnmatch implementation. > Still, as you note below, normalizing a glob pattern isn't entirely > trivial to do correctly. > > > P.S. This patch introduces a minor behaviour change: before this patch, > > the search pattern «foo[A-z]bar» would match the log message «foo_bar», > > whereas after this change it would not. (This is because the pattern is > > now casefolded between being passed to APR, and '_' is between 'A' > > and 'z' but not between 'A' and 'Z', when compared as C chars.) I doubt > > anyone will notice this behaviour change; I'm just mentioning it for > > completeness. > > Mmhh ... this is what comes of 'obviously trivial' solutions. :) The important thing is that there is no _other_ apr_fnmatch() syntax that changes meaning through case-folding the pattern, at least in apr-1.5 with flags==0. Cheers, Daniel

