On Sun, Apr 21, 2013 at 4:48 PM, Branko Čibej <br...@wandisco.com> wrote: > On 21.04.2013 14:05, Stefan Sperling wrote: >> On Sun, Apr 21, 2013 at 01:53:43PM +0200, Bert Huijben wrote: >>> I'd rather pull the case insensitive search part of this new in 1.8 search >>> feature and do it right in 1.9. >> What's the issue with the current implementation apart from the >> test failures on Windows? >> >> The behaviour of 'svn log --search' regarding case-sensitivity >> isn't even documented, so we're not really prosmising anything. >> >> It is possible that some users who are using languages other than >> English will complain, since ASCII is being matched case-insensitively, >> and all other characters are being matched case-sensitively. >> But this is due to a missing feature in APR's implemention of fnmatch(). >> >> Provided we can fix the 1.8.x tests on Windows I see no reason to >> change our implementation of log --search. We can simply wait for >> APR to grow the necessary support for multibyte strings. > > The wc-collate-path branch has an svn_utf__glob function that's mainly > intended for use by SQLite, however, it can be a replacement for > apr_fnmatch. It uses apr_fnmatch internally, but decomposes the inputs > to Unicode normalization form D, which keeps diacriticals separate from > the base letters. In other words, we could easily extend that to do > completely diacritical-agnostic case-folding matching for Latin > alphabets (and probably also for Cyrillic scripts). > > The idea to manually hack things to work with western Latin alphabets > seems completely wrong-headed to me. > > But yes; in general, case folding is locale-specific. If we wanted to > support that, we'd need ICU instead of utf8proc. I can imagine that > eventually being an option, but not a mandatory dependency. > According to Unicode case folding data [1] the only two characters uses locale specific case-folding.
So I propose the following plan: 1. Make 'log --search" case-sensitive in trunk and 1.8.x. 2. Merge utf8proc stuff to trunk 3. Implement svn_utf__casefold() using utf8proc 4. Implement 'log --isearch' using apr_fnmatch(svn_utf__casefold(pattern), svn_utf__casefold(string)) [1] http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt -- Ivan Zhakov CTO | VisualSVN | http://www.visualsvn.com