I filed issue #4476 "Mergeinfo containing r0 makes svnsync and svnadmin dump 
fail" <http://subversion.tigris.org/issues/show_bug.cgi?id=4476>.

One relatively easy part to fix is 'svnadmin dump'. A dump should not fail just 
because it is trying to parse the data to notify us if there is mergeinfo 
referring to earlier revisions. It should just warn about the failure in such 
parsing, and continue. The attached patch fixes this. Before committing it, 
I'll add a test, and I'll remove the part that adds a new notification code to 
the API, in order to make it easier to back-port the commit to 1.7.x and 1.8.x. 
(The I can re-add the notification in trunk for 1.9.)

- Julian


Julian Foad wrote on 2014-03-03:
> A customer found that 'svnsync' errored out on trying to sync a revision 
> in which the source repository introduced some mergeinfo starting with r0, 
> similar to this example:
> 
>   $ svn propget svn:mergeinfo ^/foo@1234567
>   /bar:0-1000000,1111111,1222222
> 
> The svnsync error message was:
> 
>   $ svnsync sync ...
>   svnsync: E175008: At least one property change failed; repository is 
> unchanged
>   svnsync: E175002: Error setting property 'svn:mergeinfo':
>   Could not execute PROPPATCH.
> 
> We believe this mergeinfo entered the repository through a 1.6 server. It was 
> committed in mid-2012. 1.7.x servers reject such commits, as do the later 
> 1.6.x 
> servers [1]. Probably 1.6 clients were in use too, although it may have been 
> committed from a non-Subversion client such as git-svn.
> 
> Anyhow, the situation is that we have at least one Subversion repository 
> containing data that the 1.7 server tools reject as invalid. 'svnsync' 
> errors out. Even 'svnadmin dump' errors out at this revision if you 
> specify any non-zero start revision, because it parses the mergeinfo to see 
> if 
> it points to a revision earlier than the start rev. Like this:
> 
>   $ svnadmin dump --incremental -r5 repo
>   svnadmin: E200020: Invalid revision number '0' found in range list
> 
> What is our migration path for this data?
> 
> We can figure out a short term work-around, perhaps using the unsupported 
> "SVNSYNC_UNSUPPORTED_STRIP_MERGEINFO" environment variable to bypass 
> the mergeinfo change for each revision that adds or changes such mergeinfo, 
> if 
> there aren't too many of them and if they aren't present on active 
> branches. (We can write a pre-commit hook to make svnsync stop after just one 
> revision, since it doesn't have a --revision option.)
> 
> But for a proper fix?
> 
> In the past we decided that some known ways in which mergeinfo can be 
> malformed 
> should be silently corrected or tolerated.
> 
>   leading slash is required
>     => in some cases we add it if missing
> 
>   path-revs pointing to non-existent node-revs
>     => in some cases, such as a dump-load cycle, we remove these ranges
> 
>   revision zero
>     => in some cases, such as dump, we error out on reading it
> 
>   other parse errors
>     => in some cases we error out
>     => in the case of a merge, we report "invalid mergeinfo: merge 
> tracking not possible"
>     => in some cases we ignore the error in parsing whatever amount of 
> mergeinfo we were trying to parse and skip the processing we were going to do 
> (issue #3896)
> 
> This all makes me a bit uneasy. We seem to have a number of data 
> transformations 
> going on at quite a low level, and I'm not sure what the canonical position 
> is. I would like us to have a definition of what constitutes "the same 
> mergeinfo" in a repository before and after dump/load, and a way of testing 
> that.
> 
> Philip pointed out that it's a good idea for 'dump' to dump whatever 
> is present, and not error out and not try to correct or normalize it. If any 
> correction or normalization is to be done, 'load' is a better place to 
> do it. That minimizes the risk of a damaged archive due to bugs, if you 
> archive 
> the dump file.
> 
> Clearly there are some things we should do:
> 
>   * Make 'dump' not error out, but rather ignore the broken mergeinfo 
> for the purposes of the warning that it refers to older revisions.
> 
> Other changes?
> 
>   * Make 'svnsync sync' strip out the revision 0 from that mergeinfo? Or 
> make it strip out the whole mergeinfo property if it fails to parse? Or just 
> that line of the property value? (If we do something like that, I'd like us 
> to do it everywhere we ignore bad mergeinfo, not just here.)
> 
> Thoughts?
> 
> - Julian
> 
> [1] Servers >= 1.6.18 for HTTP, >= 1.6.17 for the other protocols, reject 
> unparseable mergeinfo -- see issue 
> <http://subversion.tigris.org/issues/show_bug.cgi?id=3895>.
Don't let invalid mergeinfo stop 'svnadmin dump' from producing a dump.
Just issue a warning and continue.

Part of issue #4476 "Mergeinfo containing r0 makes svnsync and svnadmin dump
fail".

(This version adds a new warning type into the API, so is unsuitable for
backport.)

* subversion/include/svn_repos.h
  (svn_repos_notify_warning_t): Add a new warning type for unparseable
    mergeinfo.

* subversion/libsvn_repos/dump.c
  (edit_baton): Add a flag to indicate unparseable mergeinfo was found.
  (verify_mergeinfo_revisions): New function, factored out of ...
  (dump_node): ... here. And if verify_mergeinfo_revisions() produces any
    error, then notify, remember it, clear the error, and continue.
  (get_dump_editor): Pass the new flag through.
  (svn_repos_dump_fs3): If dump_node() detected unparseable mergeinfo,
    report this at the end of the dump.
  (verify_one_revision): Adjust caller, ignoring the new flag.
--This line, and those below, will be ignored--

Index: subversion/include/svn_repos.h
===================================================================
--- subversion/include/svn_repos.h	(revision 1574468)
+++ subversion/include/svn_repos.h	(working copy)
@@ -300,13 +300,20 @@ typedef enum svn_repos_notify_warning_t
    * two or more entries in the same svn:mergeinfo property differ
    * only in character representation (normalization), but are
    * otherwise identical.
    *
    * @since New in 1.9.
    */
-  svn_repos_notify_warning_mergeinfo_collision
+  svn_repos_notify_warning_mergeinfo_collision,
+
+  /** An #SVN_PROP_MERGEINFO property contains unparseable mergeinfo.
+   *
+   * @since New in 1.9.
+   */
+  svn_repos_notify_warning_found_unparseable_mergeinfo
+
 } svn_repos_notify_warning_t;
 
 /**
  * Structure used by #svn_repos_notify_func_t.
  *
  * The only field guaranteed to be populated is @c action.  Other fields are
Index: subversion/libsvn_repos/dump.c
===================================================================
--- subversion/libsvn_repos/dump.c	(revision 1574468)
+++ subversion/libsvn_repos/dump.c	(working copy)
@@ -393,12 +393,15 @@ struct edit_baton
   svn_boolean_t *found_old_reference;
 
   /* If not NULL, set to true if any mergeinfo was dumped which contains
      revisions older than OLDEST_DUMPED_REV. */
   svn_boolean_t *found_old_mergeinfo;
 
+  /* If not NULL, set to true if any unparseable mergeinfo was encountered. */
+  svn_boolean_t *found_unparseable_mergeinfo;
+
   /* reusable buffer for writing file contents */
   char buffer[SVN__STREAM_CHUNK_SIZE];
   apr_size_t bufsize;
 
   /* Structure allows us to verify the paths currently being dumped.
      If NULL, validity checks are being skipped. */
@@ -583,12 +586,54 @@
                              _("Path '%s' exists in r%ld."),
                              path, revision);
 
   return SVN_NO_ERROR;
 }
 
+/* If the mergeinfo in MERGEINFO_STR refers to any revisions older than
+ * OLDEST_DUMPED_REV, issue a warning and set *FOUND_OLD_MERGEINFO to TRUE,
+ * otherwise leave *FOUND_OLD_MERGEINFO unchanged.
+ */
+static svn_error_t *
+verify_mergeinfo_revisions(svn_boolean_t *found_old_mergeinfo,
+                           const char *mergeinfo_str,
+                           svn_revnum_t oldest_dumped_rev,
+                           svn_repos_notify_func_t notify_func,
+                           void *notify_baton,
+                           apr_pool_t *pool)
+{
+  svn_mergeinfo_t mergeinfo, old_mergeinfo;
+
+  SVN_ERR(svn_mergeinfo_parse(&mergeinfo, mergeinfo_str, pool));
+  SVN_ERR(svn_mergeinfo__filter_mergeinfo_by_ranges(
+            &old_mergeinfo, mergeinfo,
+            oldest_dumped_rev - 1, 0,
+            TRUE, pool, pool));
+
+  if (apr_hash_count(old_mergeinfo))
+    {
+      svn_repos_notify_t *notify =
+        svn_repos_notify_create(svn_repos_notify_warning, pool);
+
+      notify->warning = svn_repos_notify_warning_found_old_mergeinfo;
+      notify->warning_str = apr_psprintf(
+        pool,
+        _("Mergeinfo referencing revision(s) prior "
+          "to the oldest dumped revision (r%ld). "
+          "Loading this dump may result in invalid "
+          "mergeinfo."),
+        oldest_dumped_rev);
+
+      if (found_old_mergeinfo)
+        *found_old_mergeinfo = TRUE;
+      notify_func(notify_baton, notify, pool);
+    }
+
+  return SVN_NO_ERROR;
+}
+
 /* Unique string pointers used by verify_mergeinfo_normalization()
    and check_name_collision() */
 static const char normalized_unique[] = "normalized_unique";
 static const char normalized_collision[] = "normalized_collision";
 
 
@@ -1071,37 +1116,35 @@ dump_node(struct edit_baton *eb,
       if (!eb->verify && eb->notify_func && eb->oldest_dumped_rev > 1)
         {
           svn_string_t *mergeinfo_str = svn_hash_gets(prophash,
                                                       SVN_PROP_MERGEINFO);
           if (mergeinfo_str)
             {
-              svn_mergeinfo_t mergeinfo, old_mergeinfo;
-
-              SVN_ERR(svn_mergeinfo_parse(&mergeinfo, mergeinfo_str->data,
-                                          pool));
-              SVN_ERR(svn_mergeinfo__filter_mergeinfo_by_ranges(
-                &old_mergeinfo, mergeinfo,
-                eb->oldest_dumped_rev - 1, 0,
-                TRUE, pool, pool));
-              if (apr_hash_count(old_mergeinfo))
+              svn_error_t *err = verify_mergeinfo_revisions(
+                                   eb->found_old_mergeinfo,
+                                   mergeinfo_str->data, eb->oldest_dumped_rev,
+                                   eb->notify_func, eb->notify_baton,
+                                   pool);
+
+              /* An error in verifying the mergeinfo must not prevent the
+                 data from being dumped. If such an error occurs, clear it,
+                 notify, and remember that it happened. */
+              if (err)
                 {
                   svn_repos_notify_t *notify =
                     svn_repos_notify_create(svn_repos_notify_warning, pool);
 
-                  notify->warning = svn_repos_notify_warning_found_old_mergeinfo;
+                  notify->warning = svn_repos_notify_warning_found_unparseable_mergeinfo;
                   notify->warning_str = apr_psprintf(
-                    pool,
-                    _("Mergeinfo referencing revision(s) prior "
-                      "to the oldest dumped revision (r%ld). "
-                      "Loading this dump may result in invalid "
-                      "mergeinfo."),
-                    eb->oldest_dumped_rev);
-
-                  if (eb->found_old_mergeinfo)
-                    *eb->found_old_mergeinfo = TRUE;
+                    pool, _("Mergeinfo was unparseable."));
                   eb->notify_func(eb->notify_baton, notify, pool);
+
+                  if (eb->found_unparseable_mergeinfo)
+                    *eb->found_unparseable_mergeinfo = TRUE;
+
+                  svn_error_clear(err);
                 }
             }
         }
 
       /* If we're checking UCS normalization, also parse any changed
          mergeinfo and warn about denormalized paths and name
@@ -1591,12 +1634,13 @@ get_dump_editor(const svn_delta_editor_t
                 svn_fs_t *fs,
                 svn_revnum_t to_rev,
                 const char *root_path,
                 svn_stream_t *stream,
                 svn_boolean_t *found_old_reference,
                 svn_boolean_t *found_old_mergeinfo,
+                svn_boolean_t *found_unparseable_mergeinfo,
                 svn_error_t *(*custom_close_directory)(void *dir_baton,
                                   apr_pool_t *scratch_pool),
                 svn_repos_notify_func_t notify_func,
                 void *notify_baton,
                 svn_revnum_t oldest_dumped_rev,
                 svn_boolean_t use_deltas,
@@ -1624,12 +1668,13 @@ get_dump_editor(const svn_delta_editor_t
   eb->current_rev = to_rev;
   eb->use_deltas = use_deltas;
   eb->verify = verify;
   eb->check_normalization = check_normalization;
   eb->found_old_reference = found_old_reference;
   eb->found_old_mergeinfo = found_old_mergeinfo;
+  eb->found_unparseable_mergeinfo = found_unparseable_mergeinfo;
 
   /* In non-verification mode, we will allow anything to be dumped because
      it might be an incremental dump with possible manual intervention.
      Also, this might be the last resort when it comes to data recovery.
 
      Else, make sure that all paths exists at their respective revisions.
@@ -1753,12 +1798,13 @@ svn_repos_dump_fs3(svn_repos_t *repos,
   apr_pool_t *subpool = svn_pool_create(pool);
   svn_revnum_t youngest;
   const char *uuid;
   int version;
   svn_boolean_t found_old_reference = FALSE;
   svn_boolean_t found_old_mergeinfo = FALSE;
+  svn_boolean_t found_unparseable_mergeinfo = FALSE;
   svn_repos_notify_t *notify;
 
   /* Determine the current youngest revision of the filesystem. */
   SVN_ERR(svn_fs_youngest_rev(&youngest, fs, pool));
 
   /* Use default vals if necessary. */
@@ -1825,14 +1871,17 @@ svn_repos_dump_fs3(svn_repos_t *repos,
 
       /* Fetch the editor which dumps nodes to a file.  Regardless of
          what we've been told, don't use deltas for the first rev of a
          non-incremental dump. */
       use_deltas_for_rev = use_deltas && (incremental || rev != start_rev);
       SVN_ERR(get_dump_editor(&dump_editor, &dump_edit_baton, fs, rev,
-                              "", stream, &found_old_reference,
-                              &found_old_mergeinfo, NULL,
+                              "", stream,
+                              &found_old_reference,
+                              &found_old_mergeinfo,
+                              &found_unparseable_mergeinfo,
+                              NULL,
                               notify_func, notify_baton,
                               start_rev, use_deltas_for_rev, FALSE, FALSE,
                               subpool));
 
       /* Drive the editor in one way or another. */
       SVN_ERR(svn_fs_revision_root(&to_root, fs, rev, subpool));
@@ -1908,12 +1957,23 @@ svn_repos_dump_fs3(svn_repos_t *repos,
           notify->warning_str = _("The range of revisions dumped "
                                   "contained mergeinfo "
                                   "which reference revisions outside "
                                   "that range.");
           notify_func(notify_baton, notify, subpool);
         }
+
+      /* Ditto if we issued any warnings about unparseable mergeinfo. */
+      if (found_unparseable_mergeinfo)
+        {
+          notify = svn_repos_notify_create(svn_repos_notify_warning, subpool);
+
+          notify->warning = svn_repos_notify_warning_found_unparseable_mergeinfo;
+          notify->warning_str = _("The range of revisions dumped "
+                                  "contained unparseable mergeinfo.");
+          notify_func(notify_baton, notify, subpool);
+        }
     }
 
   svn_pool_destroy(subpool);
 
   return SVN_NO_ERROR;
 }
@@ -2090,13 +2150,13 @@ verify_one_revision(svn_fs_t *fs,
   void *cancel_edit_baton;
 
   /* Get cancellable dump editor, but with our close_directory handler.*/
   SVN_ERR(get_dump_editor(&dump_editor, &dump_edit_baton,
                           fs, rev, "",
                           svn_stream_empty(scratch_pool),
-                          NULL, NULL,
+                          NULL, NULL, NULL,
                           verify_close_directory,
                           notify_func, notify_baton,
                           start_rev,
                           FALSE, TRUE, /* use_deltas, verify */
                           check_normalization,
                           scratch_pool));

Reply via email to