[ Taking a privately-started discussion with danielsh to the list, in case others have inspiration/insight about this. Question at hand: I'm having trouble making diff3 work with prefix/suffix scanning of the diff-optimizations-bytes branch. Any feedback is highly appreciated :-). ]
On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf <d...@daniel.shahaf.name> wrote: [...snip...] > In diff3 with prefix enabled, I was seeing failures like this: > > EXPECTED: > line1 > <<< > line2 > === > line2 modified >>>> > > ACTUAL: > <<< > line1 > line2 > === > line1 > line2 modified >>>> > > so I assumed that when the prefix code gobbled the shared prefix. How > to fix it from there was purely guesswork on my part (and I haven't > implemented it). > > These failures would go away if I prepended a "line0 left" and "line0 > right" to the file. Yes, I know. That's indeed because the prefix-scanning code gobbled up the identical prefix. That problem is also there in diff2. In diff2, I fixed this by simply pre-pending a piece of "common" diff to the diff linked list, in the function diff.c#svn_diff__diff. >From r1037371: [[[ --- subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c (original) +++ subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c Sun Nov 21 02:36:07 2010 @@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs, svn_diff_t *diff; svn_diff_t **diff_ref = &diff; + if (want_common && (original_start > 1)) + { + /* we have a prefix to skip */ + (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref)); + + (*diff_ref)->type = svn_diff__type_common; + (*diff_ref)->original_start = 0; + (*diff_ref)->original_length = original_start - 1; + (*diff_ref)->modified_start = 0; + (*diff_ref)->modified_length = modified_start - 1; + (*diff_ref)->latest_start = 0; + (*diff_ref)->latest_length = 0; + + diff_ref = &(*diff_ref)->next; + } + while (1) { if (original_start < lcs->position[0]->offset ]]] Naively, I wanted to do the same for diff3 (along with some other tricks, to make the algorithm aware of the prefix_lines), but it doesn't work. See the patch in attachment. I mean: it works partly: diff-diff3-test.exe passes with the attached patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and merge_tests.py (61) don't. I haven't studied the test failures in detail (I probably should, but I currently can't wrap my head around those tests). I'm now pondering another approach, to pre-pend an "lcs" piece to the lcs linked list, and let the rest of the algorithm do its work as normal. Inspiration for this came when I was reading the code of diff3.c#svn_diff__resolve_conflict, where a similar trick is used around line 121: [[[ /* If there were matching symbols at the start of * both sequences, record that fact. */ if (common_length > 0) { lcs = apr_palloc(subpool, sizeof(*lcs)); lcs->next = NULL; lcs->position[0] = start_position[0]; lcs->position[1] = start_position[1]; lcs->length = common_length; lcs_ref = &lcs->next; } ]]] Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where I'm already passing the prefix_lines as a new parameter. If that would work out, I could probably remove the special common-diff-injecting code in diff.c#svn_diff__diff. Thoughts? It will take me some time/experimentation to work out the details, but I think that should work ... Cheers, -- Johan
Index: subversion/libsvn_diff/diff3.c =================================================================== --- subversion/libsvn_diff/diff3.c (revision 1041582) +++ subversion/libsvn_diff/diff3.c (working copy) @@ -251,10 +251,14 @@ svn_diff_diff3(svn_diff_t **diff, { svn_diff__tree_t *tree; svn_diff__position_t *position_list[3]; + svn_diff_datasource_e datasource[] = {svn_diff_datasource_original, + svn_diff_datasource_modified, + svn_diff_datasource_latest}; svn_diff__lcs_t *lcs_om; svn_diff__lcs_t *lcs_ol; apr_pool_t *subpool; apr_pool_t *treepool; + apr_off_t prefix_lines = 0; *diff = NULL; @@ -263,28 +267,30 @@ svn_diff_diff3(svn_diff_t **diff, svn_diff__tree_create(&tree, treepool); + SVN_ERR(vtable->datasources_open(diff_baton, &prefix_lines, datasource, 3)); + SVN_ERR(svn_diff__get_tokens(&position_list[0], tree, diff_baton, vtable, svn_diff_datasource_original, - FALSE, - 0, + TRUE, + prefix_lines, subpool)); SVN_ERR(svn_diff__get_tokens(&position_list[1], tree, diff_baton, vtable, svn_diff_datasource_modified, - FALSE, - 0, + TRUE, + prefix_lines, subpool)); SVN_ERR(svn_diff__get_tokens(&position_list[2], tree, diff_baton, vtable, svn_diff_datasource_latest, - FALSE, - 0, + TRUE, + prefix_lines, subpool)); /* Get rid of the tokens, we don't need them to calc the diff */ @@ -295,18 +301,18 @@ svn_diff_diff3(svn_diff_t **diff, svn_pool_destroy(treepool); /* Get the lcs for original-modified and original-latest */ - lcs_om = svn_diff__lcs(position_list[0], position_list[1], 0, + lcs_om = svn_diff__lcs(position_list[0], position_list[1], prefix_lines, subpool); - lcs_ol = svn_diff__lcs(position_list[0], position_list[2], 0, + lcs_ol = svn_diff__lcs(position_list[0], position_list[2], prefix_lines, subpool); /* Produce a merged diff */ { svn_diff_t **diff_ref = diff; - apr_off_t original_start = 1; - apr_off_t modified_start = 1; - apr_off_t latest_start = 1; + apr_off_t original_start = prefix_lines + 1; + apr_off_t modified_start = prefix_lines + 1; + apr_off_t latest_start = prefix_lines + 1; apr_off_t original_sync; apr_off_t modified_sync; apr_off_t latest_sync; @@ -317,6 +323,23 @@ svn_diff_diff3(svn_diff_t **diff, svn_boolean_t is_latest; svn_diff__position_t sentinel_position[2]; + if (prefix_lines > 0) + { + /* Insert a diff_ref for the common prefix. */ + (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref)); + + (*diff_ref)->type = svn_diff__type_common; + (*diff_ref)->original_start = 0; + (*diff_ref)->original_length = prefix_lines; + (*diff_ref)->modified_start = 0; + (*diff_ref)->modified_length = prefix_lines; + (*diff_ref)->latest_start = 0; + (*diff_ref)->latest_length = prefix_lines; + (*diff_ref)->resolved_diff = NULL; + + diff_ref = &(*diff_ref)->next; + } + /* Point the position lists to the start of the list * so that common_diff/conflict detection actually is * able to work. @@ -330,7 +353,7 @@ svn_diff_diff3(svn_diff_t **diff, } else { - sentinel_position[0].offset = 1; + sentinel_position[0].offset = prefix_lines + 1; sentinel_position[0].next = NULL; position_list[1] = &sentinel_position[0]; } @@ -344,7 +367,7 @@ svn_diff_diff3(svn_diff_t **diff, } else { - sentinel_position[1].offset = 1; + sentinel_position[1].offset = prefix_lines + 1; sentinel_position[1].next = NULL; position_list[2] = &sentinel_position[1]; }