On Fri, Jun 10, 2016 at 2:04 AM, Scott Robison <sc...@casaderobison.com>
wrote:

> On Fri, Jun 10, 2016 at 1:37 AM, Joe Mistachkin <sql...@mistachkin.com>
> wrote:
>
>>
>> Scott Robison
>> >
>> > Glad to be able to get to something before everyone else for a change.
>> :)
>> >
>>
>> Yes, thank you very much.
>>
>> Also, I know it's not a lot of fun, but...
>>
>> It would be nice if some new tests covering these edge cases were added to
>> the "utf.test" file.  The "generated section" in the file can be created
>> by
>> uncommenting the "createTestResults $tempPath 100" call.
>>
>
> I'm just about to commit and push a branch with a proposed new
> invalid_utf8 function. It will allow the "Modified UTF-8" NUL (C0 80)
> sequence, as well as the CESU-8 & WTF-8 variants described in the same
> wikipedia article. I'm including those because the current invalid_utf8
> function allowed them.
>
> My code isn't quite as efficient (profiler reports 5% diff). But I'm too
> tired to work on it further tonight. Look for "invalid_utf8_table" branch.
> You may very well see some optimization opportunities I haven't yet.
>

Branch committed. I'll run it against the existing test cases later, and
look at spiffing it up.

FYI, my test code here (C++ harness) consisted of passing every possible
four byte buffer to the old function and my new function. My function
identifies the expected number of "strings" as valid UTF-8. I didn't
eyeball each one to make sure the right ones got through, but getting the
exact right number is promising to me.

Let me know if you see anything horridly wrong with my code. It's 2am...

-- 
Scott Robison
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to