On 18.11.2011 17:58, Andrea Fontana wrote:
I build a data access layer in c++. This layer works with mongo db where
string are always encoded using UTF-8. I've ported this layer in D using
swig. String is written correctly in console but when i use std.regex
sometimes it gives an exception:
core.exception.UnicodeException@src
<mailto:core.exception.UnicodeException@src>/rt/util/utf.d(290): invalid
UTF-8 sequence
Byte sequence (for better undestanding) is:
[83, 195, 179, 32]
And the string was "Sò " (with accented o and a space)
I'm not a utf expert, so Is it a wrong utf-8 encoding or it is a bug on
utf.d?
Which version of std.regex are you using - the one from git master or
the one in the latest release?
If it's the former then I'm willing to look into this thing on weekend,
if you can get a hold of a pair: string + pattern that fails like this.
--
Dmitry Olshansky