On Jun 9, 2016, at 6:25 AM, rosscann...@fastmail.com wrote:
> 
> The bug:
> In lookslike.c, invalid_utf8() returns 'invalid' for the input 0xE0,
> 0xB8, 0x94, which is the Thai character 'do dek' (U+0E14).

I took a look at that code, and there is no possibility for it to be correct.  
It doesn’t even try to consider 3- and 4- byte sequences.

May I suggest that whoever rewrites this use BOOST_BINARY?

  http://www.boost.org/doc/libs/1_61_0/libs/utility/utility.htm#BOOST_BINARY

Despite being from Boost, it is implemented purely in C preprocessor code, so 
it should work within Fossil.

I make this suggestion because it seems to me that the key source of the error 
(errors?) in this code comes from trying to work at the hex level on a problem 
that is inherently about bitwise encoding.

There’s an interesting discussion of the rationale for C not having a binary 
literal syntax here:

  http://stackoverflow.com/q/18244726

C++14 has one, though:

  https://en.wikipedia.org/wiki/C%2B%2B14#Binary_literals

I don’t suppose Fossil could get away with using the nonstandard extensions 
supported by GCC and Clang?  That won’t cover native Windows, but Visual C++ 
2015 supports the C++14 syntax; I’ve tested it here, and it’s accepted in C 
code, too.
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to