On Jun 9, 2016, at 6:25 AM, rosscann...@fastmail.com wrote: > > The bug: > In lookslike.c, invalid_utf8() returns 'invalid' for the input 0xE0, > 0xB8, 0x94, which is the Thai character 'do dek' (U+0E14).
I took a look at that code, and there is no possibility for it to be correct. It doesn’t even try to consider 3- and 4- byte sequences. May I suggest that whoever rewrites this use BOOST_BINARY? http://www.boost.org/doc/libs/1_61_0/libs/utility/utility.htm#BOOST_BINARY Despite being from Boost, it is implemented purely in C preprocessor code, so it should work within Fossil. I make this suggestion because it seems to me that the key source of the error (errors?) in this code comes from trying to work at the hex level on a problem that is inherently about bitwise encoding. There’s an interesting discussion of the rationale for C not having a binary literal syntax here: http://stackoverflow.com/q/18244726 C++14 has one, though: https://en.wikipedia.org/wiki/C%2B%2B14#Binary_literals I don’t suppose Fossil could get away with using the nonstandard extensions supported by GCC and Clang? That won’t cover native Windows, but Visual C++ 2015 supports the C++14 syntax; I’ve tested it here, and it’s accepted in C code, too. _______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users