On 3/27/2017 6:21 PM, Richard Hipp wrote:
On 3/27/17, Ross Berteig <r...@cheshireeng.com> wrote:
I believe that a line is too long if it is more than about 8191 ASCII
characters, a restriction based on the size of the buffer used in the
diff engine.
Technically, that restriction is due to the way hashes are computed on
individual lines during the diff.  For diffing, the file is broken up
into individual lines, and every line is given a 32-bit hash that
helps to speed up locating the differences.  The lower 13 bits of the
hash are the length of the line in bytes.  The upper 19 bytes are the
actual hash.

Interesting. I didn't read further into the code than the definition of LENGTH_MASK and the comment that describes it in diff.c. I did wonder slightly at the name of that symbol, but it was described as the length of a line so I just ran with it. In lookslike.c we have UTF16_LENGTH_MASK which is described by the comment as being the same quantity expressed for UTF16 chars.

But the comment and definition don't seem to agree. Richard, take a look at
https://www.fossil-scm.org/index.html/artifact?name=3ac38fafa91d274c&ln=220-226
Line 225 would compute UTF16_LENGTH_MASK to be 13-2-1 or 10, and get 1023 for UTF16_LENGTH_MASK. But the comment says 4096....

Either the code, the comment, or I are confused here. Since I'm poking at test cases for this stuff. I'll see if I can add one that probes the UTF16 line length question.

--
Ross Berteig                               r...@cheshireeng.com
Cheshire Engineering Corp.           http://www.CheshireEng.com/
+1 626 303 1602

_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to