Glib::Regex and unicode chars

Jakub Okoński Sun, 19 Aug 2012 10:10:25 -0700

Hello,

I'm trying to implement syntax highlighting and it works perfectly until I
input some special chars, in which fetch_pos method gives me number of
bytes rather than characters. Let's take this regex object for example:


Glib::Regex::create(R"((?<word>class))",
Glib::RegexCompileFlags::REGEX_OPTIMIZE);

fetch_pos method works perfectly on ascii text, but as soon as I prepend
the string with any multibyte unicode character, fetch_pos gives me shifted
values. (The shift is equal to test_string.bytes() - test_string.length()).

I could probably fix this manually by adjusting shift by the difference of
bytes and length, but I'm sure that would not be efficient and I would have
to look back in the buffer constantly.

Same goes for capturing keywords that have unicode characters themselves,
for example capturing "clasś" (note the special character at the end) would
result in fetch_pos giving range of 6 characters, when the word contains 5
characters (but has 6 bytes).

Maybe I'm not using it correctly, but it was said that Glib::Regex supports
utf-8.

Thanks

_______________________________________________
gtkmm-list mailing list
[email protected]
https://mail.gnome.org/mailman/listinfo/gtkmm-list

Glib::Regex and unicode chars

Reply via email to