Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Scott Robison
On Fri, Jun 10, 2016 at 7:10 PM, Joe Mistachkin 
wrote:

>
> Scott Robison wrote:
> >
> > Okay, thanks for all the help. I've committed some new test cases that
> > demonstrate errors in the trunk invalid_utf8. 16 tests fail on trunk,
> > none fail on invalid_utf8_table branch (which of course doesn't mean
> > there aren't bugs, just that the sample data doesn't exercise a buggy
> > path, or that I did something wrong in adding the tests).
>
> Thanks for the new tests.
>
> >
> > But seriously, take a look at the new test cases. Let me know whether
> > you want to tweak your function or want me to merge mine to trunk.
> >
>
> Sure, I'll look at it.  I'm pretty sure we'll go with your function as
> it seems easier to understand and maintain.  Thanks for the bugfixes.


That'd be awesome. Much more satisfying to hack on fossil than some of the
other coding stuff I've done "for fun" in the past!

-- 
Scott Robison
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Joe Mistachkin

Scott Robison wrote:
>
> Okay, thanks for all the help. I've committed some new test cases that
> demonstrate errors in the trunk invalid_utf8. 16 tests fail on trunk,
> none fail on invalid_utf8_table branch (which of course doesn't mean
> there aren't bugs, just that the sample data doesn't exercise a buggy
> path, or that I did something wrong in adding the tests).

Thanks for the new tests.

>
> But seriously, take a look at the new test cases. Let me know whether
> you want to tweak your function or want me to merge mine to trunk.
>

Sure, I'll look at it.  I'm pretty sure we'll go with your function as
it seems easier to understand and maintain.  Thanks for the bugfixes.

--
Joe Mistachkin

___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Scott Robison
On Fri, Jun 10, 2016 at 5:24 PM, Joe Mistachkin 
wrote:

>
> Scott Robison wrote:
> >
> > So my expectation that it would automatically update the utf.test file is
> > incorrect? I'm supposed to manually integrate that file back to utf.test?
> >
>
> Yes and yes.
>
> >
> > If I want to specify a new test that should fail but does not currently,
> > am I just supposed to manually tweak the desired expected output of
> > utf-check.txt before integration? That's how it is looking to me at the
> > moment.
> >
>
> Yes.
>

Okay, thanks for all the help. I've committed some new test cases that
demonstrate errors in the trunk invalid_utf8. 16 tests fail on trunk, none
fail on invalid_utf8_table branch (which of course doesn't mean there
aren't bugs, just that the sample data doesn't exercise a buggy path, or
that I did something wrong in adding the tests).

I've updated the invalid_utf8_table branch with performance optimizations.
Profiling the two versions (tip of both trunk & branch), on my machine with
my test data (all possible byte combinations up to 4 bytes in length) says
that mine is now faster. I show the tip of trunk version used about 130
billion cpu cycles vs 112 for the branch version.

HA! ;)

But seriously, take a look at the new test cases. Let me know whether you
want to tweak your function or want me to merge mine to trunk.

-- 
Scott Robison
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Joe Mistachkin

Scott Robison wrote:
>
> So my expectation that it would automatically update the utf.test file is
> incorrect? I'm supposed to manually integrate that file back to utf.test?
>

Yes and yes.

>
> If I want to specify a new test that should fail but does not currently,
> am I just supposed to manually tweak the desired expected output of
> utf-check.txt before integration? That's how it is looking to me at the
> moment.
>

Yes.

--
Joe Mistachkin

___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Scott Robison
On Fri, Jun 10, 2016 at 4:23 PM, Joe Mistachkin 
wrote:

>
> Scott Robison wrote:
> >
> > Also: Simply uncommenting the "createTestResults $tempPath 100" call
> doesn't
> > seem to be doing anything for me. Here is what I'm doing:
> >
>
> Here are the steps I just used here locally:
>
> 1. Uncomment the "createTestResults" line.
> 2. Run "tclsh test\tester.tcl fossil.exe utf" from the checkout.
> 3. Results are located in "%TEMP%\utf-check.txt".
>
> If you add extra entries to the array prior to these steps, those new
> results
> should appear in the "utf-check.txt" file as well.
>

Okay, that was very helpful.

So my expectation that it would automatically update the utf.test file is
incorrect? I'm supposed to manually integrate that file back to utf.test?

If I want to specify a new test that should fail but does not currently, am
I just supposed to manually tweak the desired expected output of
utf-check.txt before integration? That's how it is looking to me at the
moment.

-- 
Scott Robison
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Joe Mistachkin

Scott Robison wrote:
>
> Also: Simply uncommenting the "createTestResults $tempPath 100" call
doesn't
> seem to be doing anything for me. Here is what I'm doing:
>

Here are the steps I just used here locally:

1. Uncomment the "createTestResults" line.
2. Run "tclsh test\tester.tcl fossil.exe utf" from the checkout.
3. Results are located in "%TEMP%\utf-check.txt".

If you add extra entries to the array prior to these steps, those new
results
should appear in the "utf-check.txt" file as well.

--
Joe Mistachkin

___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Scott Robison
On Fri, Jun 10, 2016 at 3:27 PM, Scott Robison 
wrote:

> On Fri, Jun 10, 2016 at 12:26 PM, Scott Robison 
> wrote:
>
>> On Jun 10, 2016 6:04 AM, "Jan Nijtmans"  wrote:
>> >
>> > 2016-06-10 10:12 GMT+02:00 Scott Robison:
>> > > FYI, my test code here (C++ harness) consisted of passing every
>> possible
>> > > four byte buffer to the old function and my new function. My function
>> > > identifies the expected number of "strings" as valid UTF-8. I didn't
>> eyeball
>> > > each one to make sure the right ones got through, but getting the
>> exact
>> > > right number is promising to me.
>> > >
>> > > Let me know if you see anything horridly wrong with my code. It's
>> 2am...
>> >
>> > It turns out that your code appears fine, it's just less efficient than
>> mine ;-)
>>
>> You are correct. I was going for correct and readable over fast and
>> wrong. {hides}
>>
>> > There were test-failures but all failures turned out to be errors in the
>> > expected test-outcome. I fixed those test-cases now, added more of
>> > them, and fixed the invalid_utf8() function in trunk. Now all tests pass
>> > with both trunk code and your code.
>> >
>> > Many thanks, Scott !  Once more, fossil got better than it was!
>>
>> Glad it's working. I'm all for faster correct code, too. I had run both
>> implementations through the profiler last night and knew my code was a few
>> percent slower, but figured optimization would narrow the gap when I was
>> alert enough to do more good.
>>
>> I'll take the current code and run it through my ugly test harness and
>> make sure it prints the right numbers in a bit.
>>
>
> The trunk version is still identifying certain sequences as valid. I've
> looked at the utf.test file but can't figure out where to put in cases
> which should fail. Can anyone give me a pointer on this?
>

Also: Simply uncommenting the "createTestResults $tempPath 100" call
doesn't seem to be doing anything for me. Here is what I'm doing:

1. Added four lines to the data array / list / whatever in utf.test
(appended them with numbers 201 through 204).
2. Uncommented the createTestResults call.
3. Change to an empty directory.
4. Run the test harness as "tclsh \dev\fossil\test\tester.tcl
\dev\fossil\win\fossil.exe".

The generated section remains unchanged. I'm probably missing some step,
but I can't figure it out.
-- 
Scott Robison
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Scott Robison
On Fri, Jun 10, 2016 at 12:26 PM, Scott Robison 
wrote:

> On Jun 10, 2016 6:04 AM, "Jan Nijtmans"  wrote:
> >
> > 2016-06-10 10:12 GMT+02:00 Scott Robison:
> > > FYI, my test code here (C++ harness) consisted of passing every
> possible
> > > four byte buffer to the old function and my new function. My function
> > > identifies the expected number of "strings" as valid UTF-8. I didn't
> eyeball
> > > each one to make sure the right ones got through, but getting the exact
> > > right number is promising to me.
> > >
> > > Let me know if you see anything horridly wrong with my code. It's
> 2am...
> >
> > It turns out that your code appears fine, it's just less efficient than
> mine ;-)
>
> You are correct. I was going for correct and readable over fast and wrong.
> {hides}
>
> > There were test-failures but all failures turned out to be errors in the
> > expected test-outcome. I fixed those test-cases now, added more of
> > them, and fixed the invalid_utf8() function in trunk. Now all tests pass
> > with both trunk code and your code.
> >
> > Many thanks, Scott !  Once more, fossil got better than it was!
>
> Glad it's working. I'm all for faster correct code, too. I had run both
> implementations through the profiler last night and knew my code was a few
> percent slower, but figured optimization would narrow the gap when I was
> alert enough to do more good.
>
> I'll take the current code and run it through my ugly test harness and
> make sure it prints the right numbers in a bit.
>

The trunk version is still identifying certain sequences as valid. I've
looked at the utf.test file but can't figure out where to put in cases
which should fail. Can anyone give me a pointer on this?



-- 
Scott Robison
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Scott Robison
On Jun 10, 2016 6:04 AM, "Jan Nijtmans"  wrote:
>
> 2016-06-10 10:12 GMT+02:00 Scott Robison:
> > FYI, my test code here (C++ harness) consisted of passing every possible
> > four byte buffer to the old function and my new function. My function
> > identifies the expected number of "strings" as valid UTF-8. I didn't
eyeball
> > each one to make sure the right ones got through, but getting the exact
> > right number is promising to me.
> >
> > Let me know if you see anything horridly wrong with my code. It's 2am...
>
> It turns out that your code appears fine, it's just less efficient than
mine ;-)

You are correct. I was going for correct and readable over fast and wrong.
{hides}

> There were test-failures but all failures turned out to be errors in the
> expected test-outcome. I fixed those test-cases now, added more of
> them, and fixed the invalid_utf8() function in trunk. Now all tests pass
> with both trunk code and your code.
>
> Many thanks, Scott !  Once more, fossil got better than it was!

Glad it's working. I'm all for faster correct code, too. I had run both
implementations through the profiler last night and knew my code was a few
percent slower, but figured optimization would narrow the gap when I was
alert enough to do more good.

I'll take the current code and run it through my ugly test harness and make
sure it prints the right numbers in a bit.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Jan Nijtmans
2016-06-10 10:12 GMT+02:00 Scott Robison:
> FYI, my test code here (C++ harness) consisted of passing every possible
> four byte buffer to the old function and my new function. My function
> identifies the expected number of "strings" as valid UTF-8. I didn't eyeball
> each one to make sure the right ones got through, but getting the exact
> right number is promising to me.
>
> Let me know if you see anything horridly wrong with my code. It's 2am...

It turns out that your code appears fine, it's just less efficient than mine ;-)

There were test-failures but all failures turned out to be errors in the
expected test-outcome. I fixed those test-cases now, added more of
them, and fixed the invalid_utf8() function in trunk. Now all tests pass
with both trunk code and your code.

Many thanks, Scott !  Once more, fossil got better than it was!

Regards,
   Jan Nijtmans
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Scott Robison
On Fri, Jun 10, 2016 at 2:04 AM, Scott Robison 
wrote:

> On Fri, Jun 10, 2016 at 1:37 AM, Joe Mistachkin 
> wrote:
>
>>
>> Scott Robison
>> >
>> > Glad to be able to get to something before everyone else for a change.
>> :)
>> >
>>
>> Yes, thank you very much.
>>
>> Also, I know it's not a lot of fun, but...
>>
>> It would be nice if some new tests covering these edge cases were added to
>> the "utf.test" file.  The "generated section" in the file can be created
>> by
>> uncommenting the "createTestResults $tempPath 100" call.
>>
>
> I'm just about to commit and push a branch with a proposed new
> invalid_utf8 function. It will allow the "Modified UTF-8" NUL (C0 80)
> sequence, as well as the CESU-8 & WTF-8 variants described in the same
> wikipedia article. I'm including those because the current invalid_utf8
> function allowed them.
>
> My code isn't quite as efficient (profiler reports 5% diff). But I'm too
> tired to work on it further tonight. Look for "invalid_utf8_table" branch.
> You may very well see some optimization opportunities I haven't yet.
>

Branch committed. I'll run it against the existing test cases later, and
look at spiffing it up.

FYI, my test code here (C++ harness) consisted of passing every possible
four byte buffer to the old function and my new function. My function
identifies the expected number of "strings" as valid UTF-8. I didn't
eyeball each one to make sure the right ones got through, but getting the
exact right number is promising to me.

Let me know if you see anything horridly wrong with my code. It's 2am...

-- 
Scott Robison
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Scott Robison
On Fri, Jun 10, 2016 at 1:37 AM, Joe Mistachkin 
wrote:

>
> Scott Robison
> >
> > Glad to be able to get to something before everyone else for a change. :)
> >
>
> Yes, thank you very much.
>
> Also, I know it's not a lot of fun, but...
>
> It would be nice if some new tests covering these edge cases were added to
> the "utf.test" file.  The "generated section" in the file can be created by
> uncommenting the "createTestResults $tempPath 100" call.
>

I'm just about to commit and push a branch with a proposed new invalid_utf8
function. It will allow the "Modified UTF-8" NUL (C0 80) sequence, as well
as the CESU-8 & WTF-8 variants described in the same wikipedia article. I'm
including those because the current invalid_utf8 function allowed them.

My code isn't quite as efficient (profiler reports 5% diff). But I'm too
tired to work on it further tonight. Look for "invalid_utf8_table" branch.
You may very well see some optimization opportunities I haven't yet.

-- 
Scott Robison
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Jan Nijtmans
2016-06-10 9:24 GMT+02:00 Scott Robison:
> On Fri, Jun 10, 2016 at 1:15 AM, Jan Nijtmans 
> wrote:
>>
>> 2016-06-10 2:01 GMT+02:00 Scott Robison:
>> > I just committed
>> > a one line fix (with multiple lines of comments to clarify what the code
>> > is
>> > doing in the tricky part).
>>
>> Scott, I owe you. Many thanks! You are completely right, this was an
>> edge-case not covered for.
>
>
> Glad to be able to get to something before everyone else for a change. :)
>
> FYI: There is another problem, I think, with some invalid 4 byte sequences
> being accepted (F4 00 80 80, for example). I'm working on a proposed fix.

Yeah... after your fix, the following byte sequence is accepted as
valid while really it isn't:
\xE0\x80\x80
(discovered by simply running the test suite)

So, it's still not correct yet.

Regards,
Jan Nijtmans
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Joe Mistachkin

Scott Robison
>
> Glad to be able to get to something before everyone else for a change. :)
> 

Yes, thank you very much.

Also, I know it's not a lot of fun, but...

It would be nice if some new tests covering these edge cases were added to
the "utf.test" file.  The "generated section" in the file can be created by
uncommenting the "createTestResults $tempPath 100" call.

--
Joe Mistachkin

___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Scott Robison
On Fri, Jun 10, 2016 at 1:15 AM, Jan Nijtmans 
wrote:

> 2016-06-10 2:01 GMT+02:00 Scott Robison:
> > I just committed
> > a one line fix (with multiple lines of comments to clarify what the code
> is
> > doing in the tricky part).
>
> Scott, I owe you. Many thanks! You are completely right, this was an
> edge-case not covered for.
>

Glad to be able to get to something before everyone else for a change. :)

FYI: There is another problem, I think, with some invalid 4 byte sequences
being accepted (F4 00 80 80, for example). I'm working on a proposed fix.

-- 
Scott Robison
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-10 Thread Jan Nijtmans
2016-06-10 2:01 GMT+02:00 Scott Robison:
> I just committed
> a one line fix (with multiple lines of comments to clarify what the code is
> doing in the tricky part).

Scott, I owe you. Many thanks! You are completely right, this was an
edge-case not covered for.

Regards,
   Jan Nijtmans
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-09 Thread Scott Robison
On Thu, Jun 9, 2016 at 6:19 PM, Warren Young  wrote:

> On Jun 9, 2016, at 6:01 PM, Scott Robison  wrote:
> >
> > On Thu, Jun 9, 2016 at 2:12 PM, Warren Young  wrote:
> > On Jun 9, 2016, at 6:25 AM, rosscann...@fastmail.com wrote:
> > >
> > > The bug:
> > > In lookslike.c, invalid_utf8() returns 'invalid' for the input 0xE0,
> > > 0xB8, 0x94, which is the Thai character 'do dek' (U+0E14).
> >
> > I took a look at that code, and there is no possibility for it to be
> correct.  It doesn’t even try to consider 3- and 4- byte sequences.
> >
> > It does consider 3 & 4 byte sequences in a round about way.
>
> I don’t see that it is checking that the top 2 bits of bytes 3 and 4 are
> 10, the only legal values.
>

Line 162 checks the current byte (c2) and the next byte (c) for validity.
Assuming those checks pass, line 171 sets the next byte c to be a prefix
byte for the next shorter sequence length ((c2 << 1) + 1) (or space if the
valid two byte sequence passed).

The next iteration of the loop uses the "faked" prefix byte and checks the
next byte for validity.

In this case, the old code took:

> 0xE0 0xB8 0x94

Confirmed that c2==0xE0 was a valid prefix byte and that c==0xB8 was a
valid next byte.

Then, instead of keeping the value of c, it reassigns c to ((c2<<1)+1), or
((0xE0<<1)+1) == (0xC0+1) == 0xC1.

The bug is that if a three byte sequence starts with 0xE0, transformed byte
becomes an invalid too short two byte sequence. So my code checks for that
edge case and changes the value to 0xC2.

The next iteration of the loop checks the "forged but okay for our
purposes) value of c2:

> 0xC2 0x94

Confirms that c2==0xC2 is valid and c==0x94 is valid.

Since this is a valid two byte sequence now, it sets the value of c to a
space character, which is always valid utf-8.

It's not intuitive, and I only discovered it after staring at the code for
a while and playing computer with a pencil and paper. I didn't test every
possible byte sequence, of course, but I handful I tried manually now
decode correctly. The only problem I found was with three byte sequences
that start with 0xE0.

Perhaps you will also extend Fossil’s test suite in this area.  A bit of
> Googling turns up these UTF-8 test corpora:
>
>   https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
>   http://www.columbia.edu/~fdc/utf8/


I've never looked at the fossil test suite. I'll see what I can do.

-- 
Scott Robison
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-09 Thread Warren Young
On Jun 9, 2016, at 6:01 PM, Scott Robison  wrote:
> 
> On Thu, Jun 9, 2016 at 2:12 PM, Warren Young  wrote:
> On Jun 9, 2016, at 6:25 AM, rosscann...@fastmail.com wrote:
> >
> > The bug:
> > In lookslike.c, invalid_utf8() returns 'invalid' for the input 0xE0,
> > 0xB8, 0x94, which is the Thai character 'do dek' (U+0E14).
> 
> I took a look at that code, and there is no possibility for it to be correct. 
>  It doesn’t even try to consider 3- and 4- byte sequences.
> 
> It does consider 3 & 4 byte sequences in a round about way.

I don’t see that it is checking that the top 2 bits of bytes 3 and 4 are 10, 
the only legal values.

Without that, your tests cannot rule out some illegal values.

That’s why I suggested that this be rewritten in binary.  I’d be happier with 
something like this pseudocode:

   if (c[0] & 0b1000 == 0b && len(c) >= 4) {
  // check following 3 bytes for top bits == 10
  c += 3;  // don’t recheck them
   }
   else if (c[0] & 0b == 0b1110 && len(c) > 3) {
  // same as above, but “2” instead of “3”
   }
   // etc

The corner cases like the 0x10 limit still need to be covered, of course, 
but only after the checker assures itself that it has a valid “raw UTF-8” 
value.  (“Raw” meaning it passes the basic bit encoding patterns and is now 
being decoded to make sure it is also a legal Unicode value.)

> I just committed a one line fix (with multiple lines of comments to clarify 
> what the code is doing in the tricky part).

Thank you!

Perhaps you will also extend Fossil’s test suite in this area.  A bit of 
Googling turns up these UTF-8 test corpora:

  https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
  http://www.columbia.edu/~fdc/utf8/
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-09 Thread rosscanning
Thanks! That patch works for me.
 
Ross
 
 
On Fri, Jun 10, 2016, at 10:01 AM, Scott Robison wrote:
> On Thu, Jun 9, 2016 at 2:12 PM, Warren Young  wrote:
>> On Jun 9, 2016, at 6:25 AM, rosscann...@fastmail.com wrote:
>>  >
>>  > The bug:
>>  > In lookslike.c, invalid_utf8() returns 'invalid' for the input
>>  > 0xE0,
>>  > 0xB8, 0x94, which is the Thai character 'do dek' (U+0E14).
>>
>>  I took a look at that code, and there is no possibility for it to be
>>  correct.  It doesn’t even try to consider 3- and 4- byte sequences.
>
> It does consider 3 & 4 byte sequences in a round about way. I just
> committed a one line fix (with multiple lines of comments to clarify
> what the code is doing in the tricky part).
>
> --
> Scott Robison
>
> _
> fossil-users mailing list
> fossil-users@lists.fossil-scm.org
> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
 
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-09 Thread Scott Robison
On Thu, Jun 9, 2016 at 2:12 PM, Warren Young  wrote:

> On Jun 9, 2016, at 6:25 AM, rosscann...@fastmail.com wrote:
> >
> > The bug:
> > In lookslike.c, invalid_utf8() returns 'invalid' for the input 0xE0,
> > 0xB8, 0x94, which is the Thai character 'do dek' (U+0E14).
>
> I took a look at that code, and there is no possibility for it to be
> correct.  It doesn’t even try to consider 3- and 4- byte sequences.
>

It does consider 3 & 4 byte sequences in a round about way. I just
committed a one line fix (with multiple lines of comments to clarify what
the code is doing in the tricky part).

-- 
Scott Robison
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-09 Thread Warren Young
On Jun 9, 2016, at 3:21 PM, rosscann...@fastmail.com wrote:
> 
> - it's massive;

It’s also open source under one of the most liberal licenses available.  Fossil 
could just swipe the one header file it needs.  It hasn’t changed since 2005, 
so one may presume that it is stable.

> - it has dependencies between its modules, so even if you just want a
> tiny part of it, you might need other parts;

That header does include others, so yes, someone would have to work out whether 
you run into an untenable dependency chain.  I haven’t looked deeply into it, 
but I suspect it could be boiled down to a single reasonably-small header file.

> - it has a bewildering array of compiler options.

Much of Boost is preprocessor- or template-only code, not requiring that you 
build the Boost libraries at all.  I have personally never used any of the 
Boost compiled libraries, not wanting to distribute them as dependencies on 
systems that don’t include Boost in the OS’s package repo, some of which we 
still need to support.

> One of the things I like about fossil is how quickly and easily it
> builds! It would be a shame to change that.

Agreed.  But I think we’re only talking about adding one smallish header file 
here, not making all of Boost a prerequisite.

> I like the idea of bit-manipulation macros, but they would be quite easy
> to craft ad hoc.

Patches thoughtfully considered. :)
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-09 Thread rosscanning
I'm new to this list, so take my opinion lightly, but I would hesitate
to introduce Boost into the fossil build if it's not there already.
Boost adds a long list of complications to any build it's part of:
- it's massive;
- it has dependencies between its modules, so even if you just want a
tiny part of it, you might need other parts;
- it has a bewildering array of compiler options.

One of the things I like about fossil is how quickly and easily it
builds! It would be a shame to change that.

I like the idea of bit-manipulation macros, but they would be quite easy
to craft ad hoc.

Ross



On Fri, Jun 10, 2016, at 06:12 AM, Warren Young wrote:
> On Jun 9, 2016, at 6:25 AM, rosscann...@fastmail.com wrote:
> > 
> > The bug:
> > In lookslike.c, invalid_utf8() returns 'invalid' for the input 0xE0,
> > 0xB8, 0x94, which is the Thai character 'do dek' (U+0E14).
> 
> I took a look at that code, and there is no possibility for it to be
> correct.  It doesn’t even try to consider 3- and 4- byte sequences.
> 
> May I suggest that whoever rewrites this use BOOST_BINARY?
> 
>   http://www.boost.org/doc/libs/1_61_0/libs/utility/utility.htm#BOOST_BINARY
> 
> Despite being from Boost, it is implemented purely in C preprocessor
> code, so it should work within Fossil.
> 
> I make this suggestion because it seems to me that the key source of the
> error (errors?) in this code comes from trying to work at the hex level
> on a problem that is inherently about bitwise encoding.
> 
> There’s an interesting discussion of the rationale for C not having a
> binary literal syntax here:
> 
>   http://stackoverflow.com/q/18244726
> 
> C++14 has one, though:
> 
>   https://en.wikipedia.org/wiki/C%2B%2B14#Binary_literals
> 
> I don’t suppose Fossil could get away with using the nonstandard
> extensions supported by GCC and Clang?  That won’t cover native Windows,
> but Visual C++ 2015 supports the C++14 syntax; I’ve tested it here, and
> it’s accepted in C code, too.
> ___
> fossil-users mailing list
> fossil-users@lists.fossil-scm.org
> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-09 Thread Warren Young
On Jun 9, 2016, at 6:25 AM, rosscann...@fastmail.com wrote:
> 
> The bug:
> In lookslike.c, invalid_utf8() returns 'invalid' for the input 0xE0,
> 0xB8, 0x94, which is the Thai character 'do dek' (U+0E14).

I took a look at that code, and there is no possibility for it to be correct.  
It doesn’t even try to consider 3- and 4- byte sequences.

May I suggest that whoever rewrites this use BOOST_BINARY?

  http://www.boost.org/doc/libs/1_61_0/libs/utility/utility.htm#BOOST_BINARY

Despite being from Boost, it is implemented purely in C preprocessor code, so 
it should work within Fossil.

I make this suggestion because it seems to me that the key source of the error 
(errors?) in this code comes from trying to work at the hex level on a problem 
that is inherently about bitwise encoding.

There’s an interesting discussion of the rationale for C not having a binary 
literal syntax here:

  http://stackoverflow.com/q/18244726

C++14 has one, though:

  https://en.wikipedia.org/wiki/C%2B%2B14#Binary_literals

I don’t suppose Fossil could get away with using the nonstandard extensions 
supported by GCC and Clang?  That won’t cover native Windows, but Visual C++ 
2015 supports the C++14 syntax; I’ve tested it here, and it’s accepted in C 
code, too.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] how to report bug in fossil

2016-06-09 Thread Joe Mistachkin

rosscann...@fastmail.com wrote:
>
> The bug:
> In lookslike.c, invalid_utf8() returns 'invalid' for the input 0xE0,
> 0xB8, 0x94, which is the Thai character 'do dek' (U+0E14). This can be
> easily reproduced by trying to commit a file that contains those three
> bytes and nothing else - you will get the "this file contains invalid
> UTF-8..." warning.
> 

Thanks for the report.  Jan, any hints on this one?

--
Joe Mistachkin

___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users