On Thu, Jan 28, 2010 at 06:29:41PM +0100, Stefan Sperling wrote:
> On Thu, Jan 28, 2010 at 06:10:38PM +0100, Daniel Näslund wrote:
> > > And we could rename the function to copy_hunk() which is even shorter.
> > 
> > To copy a hunk is ambigous. When the hunk is rejected we want to copy
> > the diff_text, when it is accepted we want the modified_text. I'd say,
> > leave as is.
> 
> But HI->rejected will tell you if the hunk was rejected! :)

Gah, I forgot about the rejected field.
 
> And you can pass the entire patch_target_t to give copy_hunk_lines()
> access to both the patched and rejected streams.
> 
> We could rename it to emit_hunk(), flush_hunk(), hunk_done(), or something
> similar, signifying that the function is responsible for dealing with
> a hunk which has been processed.
> 
> The function itself can figure out what to do with the hunk, based on
> information in the hunk_info_t, rather than having the caller figure it out.


Fixed, but with doubts. Passing only hi instead of (hi, n, fuzz) was
fine but only passing target made it harder to understand why the caller
calls copy_hunk(). But I've done it so I couldn't have been totally
against it.

A new log message.

[[[
Fix #3460 - svn patch is not fuzzy when applying unidiffs. Refactor some
parts of the hunk parsing while at it.

* subversion/include/private/svn_diff_private.h
  (svn_hunk_t): Add fields leading_context and trailing_context. They are
    used for determining if there is enough context to apply a patch
    with fuzz.

* subversion/libsvn_diff/parse-diff.c
  (parse_next_hunk): Count number of lines of context at start and end
    of hunk and save the information in hunk. Refactored some if
    statements for increased readability.

* subversion/tests/cmdline/patch_tests.py
  (patch_with_fuzz): New.
  (test_list): Add new test.

* subversion/libsvn_client/patch.c
  (hunk_info_t): Add field fuzz.
  (match_hunk): Add new parameter fuzz. Record nr of lines read and
    ignore the ones at the beginning and end of the hunk that is fuzzy.
    Use leading_context and trailing_context to ignore the cases when
    there isn't enough context to do fuzzy patching.
  (scan_for_match): Add new parameter fuzz. Call match_hunk() with fuzz.
  (get_hunk_info): Add new parameter fuzz. Call scan_for_match() with
    fuzz. Save the used fuzz in hi to be used when the hunk should be
    copied to the target.
  (copy_hunk_text): Rename this to ..
  (copy_hunk): .. And use just hi and target as parameters. We can
    detect what kind of hunk we're dealing with inside the function. 
    Record read_lines and only copy those who are not within the fuzz
    limit. They have already been or will be copied from the target. If
    we're dealing with a rejected hunk we copy all lines.
  (apply_one_hunk): Adjust lines copied to the target to include the
    context lines at the beginning of the hunk who are fuzzy. Adjust the
    current_line of the target to point to the last line of the hunk
    minus fuzz.
  (apply_one_patch): Try to call get_hunk_info() with no fuzz. If we get
    no matching line try with fuzz 1 and if that fails try with fuzz 2.

Patch by: Daniel Näslund <daniel{_AT_}longitudo.com>
Review by: julianfoad
           stsp
]]]

Daniel
Index: subversion/libsvn_diff/parse-diff.c
===================================================================
--- subversion/libsvn_diff/parse-diff.c	(revision 904119)
+++ subversion/libsvn_diff/parse-diff.c	(arbetskopia)
@@ -224,6 +224,9 @@
   svn_stream_t *original_text;
   svn_stream_t *modified_text;
   svn_linenum_t original_lines;
+  svn_linenum_t leading_context  = 0;
+  svn_linenum_t trailing_context  = 0;
+  svn_boolean_t changed_line_seen  = FALSE;
   apr_pool_t *iterpool;
 
   if (apr_file_eof(patch->patch_file) == APR_EOF)
@@ -278,21 +281,29 @@
             }
 
           c = line->data[0];
-          if (original_lines > 0 && (c == ' ' || c == '-'))
+          /* Tolerate chopped leading spaces on empty lines. */
+          if (original_lines > 0 && (c == ' ' || (! eof && line->len == 0)))
             {
               hunk_seen = TRUE;
               original_lines--;
+              if (changed_line_seen)
+                trailing_context++;
+              else
+                leading_context++;
             }
-          else if (c == '+')
+          else if (c == '+' || c == '-')
             {
               hunk_seen = TRUE;
+              changed_line_seen = TRUE;
+
+              /* A hunk may have context in the middle. We only want the
+                 last lines of context. */
+              if (trailing_context > 0)
+                trailing_context = 0;
+
+              if (original_lines > 0 && c == '-')
+                original_lines--;
             }
-          /* Tolerate chopped leading spaces on empty lines. */
-          else if (original_lines > 0 && ! eof && line->len == 0)
-            {
-              hunk_seen = TRUE;
-              original_lines--;
-            }
           else
             {
               in_hunk = FALSE;
@@ -363,6 +374,8 @@
       (*hunk)->diff_text = diff_text;
       (*hunk)->original_text = original_text;
       (*hunk)->modified_text = modified_text;
+      (*hunk)->leading_context = leading_context;
+      (*hunk)->trailing_context = trailing_context;
     }
   else
     /* Something went wrong, just discard the result. */
Index: subversion/tests/cmdline/patch_tests.py
===================================================================
--- subversion/tests/cmdline/patch_tests.py	(revision 904119)
+++ subversion/tests/cmdline/patch_tests.py	(arbetskopia)
@@ -967,7 +967,137 @@
                                        None, # expected err
                                        1, # check-props
                                        1) # dry-run
+def patch_with_fuzz(sbox):
+  "apply a patch with fuzz"
 
+  sbox.build()
+  wc_dir = sbox.wc_dir
+  patch_file_path = tempfile.mkstemp(dir=os.path.abspath(svntest.main.temp_dir))[1]
+
+  mu_path = os.path.join(wc_dir, 'A', 'mu')
+
+  # We have replaced a couple of lines to cause fuzz. Those lines contains
+  # the word fuzz
+  mu_contents = [
+    "Line replaced for fuzz = 1\n",
+    "\n",
+    "We wish to congratulate you over your email success in our computer\n",
+    "Balloting. This is a Millennium Scientific Electronic Computer Draw\n",
+    "in which email addresses were used. All participants were selected\n",
+    "through a computer ballot system drawn from over 100,000 company\n",
+    "and 50,000,000 individual email addresses from all over the world.\n",
+    "Line replaced for fuzz = 2 with only the second context line changed\n",
+    "Your email address drew and have won the sum of  750,000 Euros\n",
+    "( Seven Hundred and Fifty Thousand Euros) in cash credited to\n",
+    "file with\n",
+    "    REFERENCE NUMBER: ESP/WIN/008/05/10/MA;\n",
+    "    WINNING NUMBER : 14-17-24-34-37-45-16\n",
+    "    BATCH NUMBERS :\n",
+    "    EULO/1007/444/606/08;\n",
+    "    SERIAL NUMBER: 45327\n",
+    "and PROMOTION DATE: 13th June. 2009\n",
+    "\n",
+    "To claim your winning prize, you are to contact the appointed\n",
+    "agent below as soon as possible for the immediate release of your\n",
+    "winnings with the below details.\n",
+    "\n",
+    "Line replaced for fuzz = 2\n",
+    "Line replaced for fuzz = 2\n",
+  ]
+
+  # Set mu contents
+  svntest.main.file_write(mu_path, ''.join(mu_contents))
+  expected_output = svntest.wc.State(wc_dir, {
+    'A/mu'       : Item(verb='Sending'),
+    })
+  expected_status = svntest.actions.get_virginal_state(wc_dir, 1)
+  expected_status.tweak('A/mu', wc_rev=2)
+  svntest.actions.run_and_verify_commit(wc_dir, expected_output,
+                                      expected_status, None, wc_dir)
+
+  unidiff_patch = [
+    "Index: mu\n",
+    "===================================================================\n",
+    "--- A/mu\t(revision 0)\n",
+    "+++ A/mu\t(revision 0)\n",
+    "@@ -1,6 +1,7 @@\n",
+    " Dear internet user,\n",
+    " \n",
+    " We wish to congratulate you over your email success in our computer\n",
+    "+A new line here\n",
+    " Balloting. This is a Millennium Scientific Electronic Computer Draw\n",
+    " in which email addresses were used. All participants were selected\n",
+    " through a computer ballot system drawn from over 100,000 company\n",
+    "@@ -7,6 +8,7 @@\n",
+    " and 50,000,000 individual email addresses from all over the world.\n",
+    " \n",
+    " Your email address drew and have won the sum of  750,000 Euros\n",
+    "+Another new line\n",
+    " ( Seven Hundred and Fifty Thousand Euros) in cash credited to\n",
+    " file with\n",
+    "     REFERENCE NUMBER: ESP/WIN/008/05/10/MA;\n",
+    "@@ -19,6 +20,7 @@\n",
+    " To claim your winning prize, you are to contact the appointed\n",
+    " agent below as soon as possible for the immediate release of your\n",
+    " winnings with the below details.\n",
+    "+A third new line\n",
+    " \n",
+    " Again, we wish to congratulate you over your email success in our\n"
+    " computer Balloting.\n"
+  ]
+
+  svntest.main.file_write(patch_file_path, ''.join(unidiff_patch))
+
+  mu_contents = [
+    "Line replaced for fuzz = 1\n",
+    "\n",
+    "We wish to congratulate you over your email success in our computer\n",
+    "A new line here\n",
+    "Balloting. This is a Millennium Scientific Electronic Computer Draw\n",
+    "in which email addresses were used. All participants were selected\n",
+    "through a computer ballot system drawn from over 100,000 company\n",
+    "and 50,000,000 individual email addresses from all over the world.\n",
+    "Line replaced for fuzz = 2 with only the second context line changed\n",
+    "Your email address drew and have won the sum of  750,000 Euros\n",
+    "Another new line\n",
+    "( Seven Hundred and Fifty Thousand Euros) in cash credited to\n",
+    "file with\n",
+    "    REFERENCE NUMBER: ESP/WIN/008/05/10/MA;\n",
+    "    WINNING NUMBER : 14-17-24-34-37-45-16\n",
+    "    BATCH NUMBERS :\n",
+    "    EULO/1007/444/606/08;\n",
+    "    SERIAL NUMBER: 45327\n",
+    "and PROMOTION DATE: 13th June. 2009\n",
+    "\n",
+    "To claim your winning prize, you are to contact the appointed\n",
+    "agent below as soon as possible for the immediate release of your\n",
+    "winnings with the below details.\n",
+    "A third new line\n",
+    "\n",
+    "Line replaced for fuzz = 2\n",
+    "Line replaced for fuzz = 2\n",
+  ]
+
+  expected_output = [
+    'U         %s\n' % os.path.join(wc_dir, 'A', 'mu'),
+  ]
+  expected_disk = svntest.main.greek_state.copy()
+  expected_disk.tweak('A/mu', contents=''.join(mu_contents))
+
+  expected_status = svntest.actions.get_virginal_state(wc_dir, 1)
+  expected_status.tweak('A/mu', status='M ', wc_rev=2)
+
+  expected_skip = wc.State('', { })
+
+  svntest.actions.run_and_verify_patch(wc_dir, os.path.abspath(patch_file_path),
+                                       expected_output,
+                                       expected_disk,
+                                       expected_status,
+                                       expected_skip,
+                                       None, # expected err
+                                       1, # check-props
+                                       1) # dry-run
+
 def patch_keywords(sbox):
   "apply a patch containing keywords"
 
@@ -1042,6 +1172,7 @@
               patch_add_new_dir,
               patch_reject,
               patch_keywords,
+              patch_with_fuzz,
             ]
 
 if __name__ == '__main__':
Index: subversion/include/private/svn_diff_private.h
===================================================================
--- subversion/include/private/svn_diff_private.h	(revision 904119)
+++ subversion/include/private/svn_diff_private.h	(arbetskopia)
@@ -81,6 +81,12 @@
   svn_linenum_t original_length;
   svn_linenum_t modified_start;
   svn_linenum_t modified_length;
+
+  /* Number of lines starting with ' ' before first '+' or '-'. */
+  svn_linenum_t leading_context;
+
+  /* Number of lines starting with ' ' after last '+' or '-'. */
+  svn_linenum_t trailing_context;
 } svn_hunk_t;
 
 /* Data type to manage parsing of patches. */
Index: subversion/libsvn_client/patch.c
===================================================================
--- subversion/libsvn_client/patch.c	(revision 904119)
+++ subversion/libsvn_client/patch.c	(arbetskopia)
@@ -52,6 +52,10 @@
 
   /* Whether this hunk has been rejected. */
   svn_boolean_t rejected;
+
+  /* The fuzz factor used when matching this hunk, i.e. how many
+   * lines of leading and trailing context to ignore during matching. */
+  int fuzz;
 } hunk_info_t;
 
 typedef struct {
@@ -588,18 +592,19 @@
   return SVN_NO_ERROR;
 }
 
-/* Indicate in *MATCHED whether the original text of HUNK
- * matches the patch TARGET at its current line.
- * When this function returns, neither TARGET->CURRENT_LINE nor the
- * file offset in the target file will have changed.
- * HUNK->ORIGINAL_TEXT will be reset.
- * Do temporary allocations in POOL. */
+/* Indicate in *MATCHED whether the original text of HUNK matches the patch
+ * TARGET at its current line. Lines within FUZZ lines of the start or end
+ * of HUNK will always match. When this function returns, neither
+ * TARGET->CURRENT_LINE nor the file offset in the target file will have
+ * changed. HUNK->ORIGINAL_TEXT will be reset.  Do temporary allocations in
+ * POOL. */
 static svn_error_t *
 match_hunk(svn_boolean_t *matched, patch_target_t *target,
-           const svn_hunk_t *hunk, apr_pool_t *pool)
+           const svn_hunk_t *hunk, int fuzz, apr_pool_t *pool)
 {
   svn_stringbuf_t *hunk_line;
   const char *target_line;
+  svn_linenum_t lines_read = 0;
   svn_linenum_t saved_line;
   svn_boolean_t hunk_eof;
   svn_boolean_t lines_matched;
@@ -630,9 +635,19 @@
                                            eol_str, FALSE,
                                            target->keywords, FALSE,
                                            iterpool));
+      lines_read++;
       SVN_ERR(read_line(target, &target_line, iterpool, iterpool));
+
       if (! hunk_eof)
-        lines_matched = ! strcmp(hunk_line_translated, target_line);
+        {
+          if (lines_read <= fuzz && hunk->leading_context > fuzz)
+            lines_matched = TRUE;
+          else if (lines_read > hunk->original_length - fuzz &&
+                   hunk->trailing_context > fuzz)
+            lines_matched = TRUE;
+          else
+            lines_matched = ! strcmp(hunk_line_translated, target_line);
+        }
     }
   while (lines_matched && ! (hunk_eof || target->eof));
 
@@ -658,7 +673,7 @@
 }
 
 /* Scan lines of TARGET for a match of the original text of HUNK,
- * up to but not including the specified UPPER_LINE.
+ * up to but not including the specified UPPER_LINE. Use fuzz factor FUZZ.
  * If UPPER_LINE is zero scan until EOF occurs when reading from TARGET.
  * Return the line at which HUNK was matched in *MATCHED_LINE.
  * If the hunk did not match at all, set *MATCHED_LINE to zero.
@@ -670,7 +685,7 @@
 static svn_error_t *
 scan_for_match(svn_linenum_t *matched_line, patch_target_t *target,
                const svn_hunk_t *hunk, svn_boolean_t match_first,
-               svn_linenum_t upper_line, apr_pool_t *pool)
+               svn_linenum_t upper_line, int fuzz, apr_pool_t *pool)
 {
   apr_pool_t *iterpool;
 
@@ -684,7 +699,7 @@
 
       svn_pool_clear(iterpool);
 
-      SVN_ERR(match_hunk(&matched, target, hunk, iterpool));
+      SVN_ERR(match_hunk(&matched, target, hunk, fuzz, iterpool));
       if (matched)
         {
           svn_boolean_t taken = FALSE;
@@ -720,14 +735,14 @@
 
 /* Determine the line at which a HUNK applies to the TARGET file,
  * and return an appropriate hunk_info object in *HI, allocated from
- * RESULT_POOL. If no correct line can be determined,
- * set HI->MATCHED_LINE to zero.
+ * RESULT_POOL. Use fuzz factor FUZZ. Set HI->FUZZ to FUZZ. If no correct line can
+ * be determined, set HI->MATCHED_LINE to zero.
  * When this function returns, neither TARGET->CURRENT_LINE nor the
  * file offset in the target file will have changed.
  * Do temporary allocations in POOL. */
 static svn_error_t *
 get_hunk_info(hunk_info_t **hi, patch_target_t *target,
-              const svn_hunk_t *hunk, apr_pool_t *result_pool,
+              const svn_hunk_t *hunk, int fuzz, apr_pool_t *result_pool,
               apr_pool_t *scratch_pool)
 {
   svn_linenum_t matched_line;
@@ -748,7 +763,7 @@
        * should be going. */
       SVN_ERR(seek_to_line(target, hunk->original_start, scratch_pool));
       SVN_ERR(scan_for_match(&matched_line, target, hunk, TRUE,
-                             hunk->original_start + 1, scratch_pool));
+                             hunk->original_start + 1, fuzz, scratch_pool));
       if (matched_line != hunk->original_start)
         {
           /* Scan the whole file again from the start. */
@@ -757,7 +772,7 @@
           /* Scan forward towards the hunk's line and look for a line
            * where the hunk matches. */
           SVN_ERR(scan_for_match(&matched_line, target, hunk, FALSE,
-                                 hunk->original_start, scratch_pool));
+                                 hunk->original_start, fuzz, scratch_pool));
 
           /* In tie-break situations, we arbitrarily prefer early matches
            * to save us from scanning the rest of the file. */
@@ -766,7 +781,7 @@
               /* Scan forward towards the end of the file and look
                * for a line where the hunk matches. */
               SVN_ERR(scan_for_match(&matched_line, target, hunk, TRUE, 0,
-                                     scratch_pool));
+                                     fuzz, scratch_pool));
             }
         }
 
@@ -778,6 +793,7 @@
   (*hi)->hunk = hunk;
   (*hi)->matched_line = matched_line;
   (*hi)->rejected = FALSE;
+  (*hi)->fuzz = fuzz;
 
   return SVN_NO_ERROR;
 }
@@ -830,17 +846,36 @@
   return SVN_NO_ERROR;
 }
 
-/* Copy HUNK_TEXT into TARGET, line by line, such that the line filter
- * and transformation callbacks set on HUNK_TEXT by the diff parsing
- * code in libsvn_diff will trigger. ABSPATH is the absolute path to the
- * file underlying TARGET. */
+/* Copy the appropiate text from HI into TARGET, line by line, such that the line filter
+ * and transformation callbacks set on the HI stream by the diff parsing
+ * code in libsvn_diff will trigger. */
 static svn_error_t *
-copy_hunk_text(svn_stream_t *hunk_text, svn_stream_t *target,
-               const char *abspath, apr_pool_t *scratch_pool)
+copy_hunk(hunk_info_t *hi, patch_target_t *target,
+          apr_pool_t *scratch_pool)
 {
   svn_boolean_t eof;
+  svn_stream_t *hunk_text;
+  svn_stream_t *stream;
+  const char *abspath;
+  int n;
+  int fuzz = hi->fuzz;
   apr_pool_t *iterpool;
+  int read_lines = 0;
 
+  if (hi->rejected)
+    {
+      hunk_text = hi->hunk->diff_text;
+      stream = target->reject;
+      abspath = target->reject_path;
+    }
+  else
+    {
+      hunk_text = hi->hunk->modified_text;
+      stream = target->patched;
+      abspath = target->patched_path;
+      n = hi->hunk->modified_length;
+    }
+
   iterpool = svn_pool_create(scratch_pool);
   do
     {
@@ -851,15 +886,30 @@
 
       SVN_ERR(svn_stream_readline_detect_eol(hunk_text, &line, &eol_str,
                                              &eof, iterpool));
-      if (! eof)
+      read_lines++;
+
+      if (! eof && hi->rejected)
         {
           if (line->len >= 1)
-            SVN_ERR(try_stream_write(target, abspath, line->data, line->len,
+            SVN_ERR(try_stream_write(stream, abspath, line->data, line->len,
                                      iterpool));
           if (eol_str)
-            SVN_ERR(try_stream_write(target, abspath, eol_str, strlen(eol_str),
+            SVN_ERR(try_stream_write(stream, abspath, eol_str, strlen(eol_str),
                                      iterpool));
         }
+
+      /* We only want to copy the lines without fuzz. Fuzz means that the
+       * target is changed on those lines. We should use the targets
+       * version then. */
+      else if (! eof && read_lines > fuzz && read_lines <= n - fuzz)
+        {
+          if (line->len >= 1)
+            SVN_ERR(try_stream_write(stream, abspath, line->data, line->len,
+                                     iterpool));
+          if (eol_str)
+            SVN_ERR(try_stream_write(stream, abspath, eol_str, strlen(eol_str),
+                                     iterpool));
+        }
     }
   while (! eof);
   svn_pool_destroy(iterpool);
@@ -886,24 +936,29 @@
   len = strlen(hunk_header);
   SVN_ERR(svn_stream_write(target->reject, hunk_header, &len));
 
-  SVN_ERR(copy_hunk_text(hi->hunk->diff_text, target->reject,
-                         target->reject_path, pool));
-
   target->had_rejects = TRUE;
   hi->rejected = TRUE;
 
+  SVN_ERR(copy_hunk(hi, target, pool));
+
+
   return SVN_NO_ERROR;
 }
 
-/* Apply a hunk described by hunk info HI to a patch TARGET.
+/* Apply a hunk described by hunk info HI to a patch TARGET. If we have FUZZ
+ * use the lines from the target for those lines instead of the hunk lines.
  * Do all allocations in POOL. */
 static svn_error_t *
-apply_one_hunk(patch_target_t *target, hunk_info_t *hi, apr_pool_t *pool)
+apply_one_hunk(patch_target_t *target, hunk_info_t *hi, 
+               apr_pool_t *pool)
 {
+  svn_linenum_t last_line;
+
   if (target->kind == svn_node_file)
     {
-      /* Move forward to the hunk's line, copying data as we go. */
-      SVN_ERR(copy_lines_to_target(target, hi->matched_line, pool));
+      /* Move forward to the hunk's line, copying data as we go. We want to
+       * use the targets version for the fuzzy lines of the hunk. */
+      SVN_ERR(copy_lines_to_target(target, hi->matched_line + hi->fuzz, pool));
       if (target->eof)
         {
           /* File is shorter than it should be. */
@@ -912,14 +967,15 @@
         }
 
       /* Skip the target's version of the hunk. */
+      last_line = target->current_line + 
+        hi->hunk->original_length - (2 * hi->fuzz);
       SVN_ERR(seek_to_line(target,
-                           target->current_line + hi->hunk->original_length,
+                           last_line,
                            pool));
     }
 
   /* Copy the patched hunk text into the patched stream. */
-  SVN_ERR(copy_hunk_text(hi->hunk->modified_text, target->patched,
-                         target->patched_path, pool));
+  SVN_ERR(copy_hunk(hi, target, pool));
 
   return SVN_NO_ERROR;
 }
@@ -1026,6 +1082,7 @@
   apr_finfo_t working_file;
   apr_finfo_t patched_file;
   int i;
+  int MAX_FUZZ = 2;
 
   SVN_ERR(init_patch_target(&target, patch, abs_wc_path, ctx, strip_count,
                             pool, pool));
@@ -1044,13 +1101,21 @@
     {
       svn_hunk_t *hunk;
       hunk_info_t *hi;
+      int fuzz = 0;
 
       svn_pool_clear(iterpool);
 
       hunk = APR_ARRAY_IDX(patch->hunks, i, svn_hunk_t *);
 
-      /* Determine the line the hunk should be applied at. */
-      SVN_ERR(get_hunk_info(&hi, target, hunk, pool, iterpool));
+      /* Determine the line the hunk should be applied at. If no match is
+       * found initially, try with fuzz. */
+      do 
+        {
+          SVN_ERR(get_hunk_info(&hi, target, hunk, fuzz, pool, iterpool));
+          fuzz++;
+        }
+      while (hi->matched_line == 0 && fuzz <= MAX_FUZZ);
+
       if (hi->matched_line == 0)
         {
           /* The hunk does not apply, reject it. */

Reply via email to