On Wed, Feb 9, 2011 at 10:20 AM, Daniel Shahaf <d...@daniel.shahaf.name> wrote: > In other news, I looked into the cause --- tried to make > datasource_get_next_token() do one more loop in the place where > currently it does 'return if at_start_of_suffix()' --- but that didn't > fix the truncation...
Indeed, that won't fix it. I think the best (cleanest, most robust, least special-cased) approach is to leave the suffix stripping as it is (and let datasource_get_next_token() stop when it encounters the suffix), but to count the number of suffix lines while scanning them (like we do for prefix). Then that suffix_lines result can be passed around, like the prefix_lines, so it ends up in the call to lcs.c#svn_diff__lcs, where a piece of "suffix_lcs" can be added to the lcs chain (just like the prefix_lcs is prepended to it). If we get there, the rest will work automagically :-). The only drawback of this is a minor loss of performance of the optimization, since we now have to count newlines while scanning the suffix (i.e. need to compare every byte scanned). But I can live with that :-). I'll see how much impact this has once I get it working. I've done part of this locally (adapting the function parameters and passing the value around), but still have to do the hard parts: - actually counting the newlines in find_identical_suffix - create and append the suffix_lcs in lcs.c#svn_diff__lcs. It might take me a couple more days to finish that. > In the meantime, I tweaked a test to make it XFail (r1068798). From > a quick glance it seemed to be relevant, but in second thought I'll > admit I didn't study the test thoroughly before making the patch. Great, thanks. The fact that it's random is a bit annoying, but at least it's something. I was thinking of writing a specific merge test (or another one in diff-diff3-test.c) with say 100 lines of identical suffix. But for now I don't have the time to do that. Maybe later ... -- Johan