It seems related to toLower too... Here the line with exception:
s = replace(s, regex(`[^"a-zA-Z0-9àòèéìù\.]`, "g"), " ").toLower(); Where s is a string with that sequence... Using dmd 2.056 Il giorno ven, 18/11/2011 alle 20.33 +0400, Dmitry Olshansky ha scritto: > On 18.11.2011 17:58, Andrea Fontana wrote: > > I build a data access layer in c++. This layer works with mongo db where > > string are always encoded using UTF-8. I've ported this layer in D using > > swig. String is written correctly in console but when i use std.regex > > sometimes it gives an exception: > > > > core.exception.UnicodeException@src > > <mailto:core.exception.UnicodeException@src>/rt/util/utf.d(290): invalid > > UTF-8 sequence > > > > Byte sequence (for better undestanding) is: > > [83, 195, 179, 32] > > > > And the string was "Sò " (with accented o and a space) > > > > I'm not a utf expert, so Is it a wrong utf-8 encoding or it is a bug on > > utf.d? > > > > Which version of std.regex are you using - the one from git master or > the one in the latest release? > If it's the former then I'm willing to look into this thing on weekend, > if you can get a hold of a pair: string + pattern that fails like this. > >
