Sweeping through my TODO list turned up this thread. I'll commit this last patch tomorrow if there are no objections.
I pause a day only because I recall Mike saying he was planning to take a look at this. Paul On Tue, May 11, 2010 at 4:00 PM, Paul Burba <ptbu...@gmail.com> wrote: > On Tue, Apr 20, 2010 at 11:08 PM, C. Michael Pilato <cmpil...@collab.net> > wrote: >> Paul Burba wrote: >>> Hi Mike and Julian, >>> >>> Sorry for the delayed response; been working on/thinking about other >>> mergey stuff lately... >>> >>> A few inline comments and then a proposed course of action at the end. >>> >>> On Thu, Apr 15, 2010 at 9:28 AM, C. Michael Pilato <cmpil...@collab.net> >>> wrote: >>>> The question then becomes, "How do users deal with legitimate partial dumps >>>> that will be loaded atop something other than loads from previous >>>> incremental windows?" I think they do this the same way they handle the >>>> copy case: either hand-hacking the dumpstream or using 'svndumpfilter'. >>>> So >>>> maybe 'svndumpfilter' grows the logic and options required to pare away >>>> mergeinfo that doesn't meet some criteria (such as "ranges outside the dump >>>> stream range")? >>> >>> I vote for svndumpfilter growing the logic to filter ranges predating >>> the start of the dump stream in the same way it strips mergeinfo with >>> merge sources filtered from the dump stream. Basically this means >>> moving the r927243 functionality to svndumpfilter. No really sure we >>> need a new option, seems we could overload >>> --skip-missing-merge-sources to include this behavior. Thoughts? >> >> +1 >> >>>> Julian Foad wrote: >>>>> On Wed, 2010-04-14, C. Michael Pilato wrote: >>>>>> Assuming similar behavior for mergeinfo handling, we have, at a minimum: >>>>>> >>>>>> 1. 'svnadmin dump' warns when it is in incremental mode and must >>>>>> generate >>>>>> mergeinfo from a merge source that predates the beginning of the dump >>>>>> window, but it's only a warning and the dump continues. >>> >>> Why only --incremental mode? We should warn for both --incremental >>> dumps *and* non-incremental dumps because in the latter case we might >>> be specifying a dump range -rX:Y where rX>0* and there might be >>> mergeinfo that refers to revs < X. And while we might want to be >>> clever about skipping the warning when no range is specified (i.e. >>> defaulting to -r0:HEAD), even then we should warn because issue #3020 >>> might already have resulted in a r1 with mergeinfo in the starting >>> repository. >> >> You're right, of course. I was thinking "partial dump" and saying >> "incremental". My bad. >> >>>>>> 2. 'svnadmin load' does nothing smart, trusting that the dump it's being >>>>>> fed is a sensible one. >>>>> Better: 'svnadmin load' tries to validate the mergeinfo, and issues a >>>>> warning if it refers to a non-existent source. The admin then gets to >>>>> figure out whether it was bad in the first place or because of his/her >>>>> partial-dump/partial-load scenario. >>>>> >>>>> However, it depends how efficient the checking is. If that would make >>>>> the 'load' really slow, I can see that not being wanted. >> >> [...] >> >>> No unsurprisingly, this took significantly (62%) longer, 00:05:39.609. >> >> (My vote was for 'svnadmin load' does nothing smart, and I stand by it.) >> >>> Taking what you've had to say into consideration, I propose the following: >>> >>> 1) Revert http://svn.apache.org/viewvc?view=revision&revision=927243. >>> This stops svnadmin load from automatically filtering mergeinfo >>> referring to revisions outside of the load stream. >>> >>> 2) Reapply or reimplement the part of r327243 that accounts for the >>> case where the first loaded revision in a load stream has mergeinfo, >>> see http://subversion.tigris.org/issues/show_bug.cgi?id=3020#desc10 >>> item #1. >> >> (r927243, you mean.) >> >>> 3) Keep the inline copy-source warnings as they exist today in >>> svnadmin dump, but add a new end-of-process summary warning e.g.: >>> >>> * Dumped revision 504. >>> * Dumped revision 505. >>> * Dumped revision 506. >>> * Dumped revision 507. >>> WARNING: Referencing data in revision 250, which is older than the oldest >>> WARNING: dumped revision (504). Loading this dump into an empty >>> repository >>> WARNING: will fail. >>> * Dumped revision 58. >>> * Dumped revision 59. >>> * Dumped revision 510. >>> * Dumped revision 511. >>> . >>> . >>> . >>> WARNING: The range of revisions dumped contained references to copy >>> sources outside that range. >>> >>> 4) Implement inline and summary functionality analogous to 3) but for >>> mergeinfo ranges outside of the dump stream, e.g.: >>> >>> * Dumped revision 504. >>> * Dumped revision 505. >>> * Dumped revision 506. >>> * Dumped revision 507. >>> WARNING: Mergeinfo referencing revision(s) prior to revision 250, >>> which is older >>> WARNING: than the oldest dumped revision (504). Loading this dump may >>> result >>> WARNING: in invalid mergeinfo. >>> * Dumped revision 58. >>> * Dumped revision 59. >>> * Dumped revision 510. >>> * Dumped revision 511. >>> . >>> . >>> . >>> WARNING: The range of revisions dumped contained references to >>> mergeinfo outside that range. >>> >>> Also, as mentioned earlier, I think this warning should kick in >>> regardless of whether --incremental is used. >>> >>> 5) Overload svndumpfilter's --skip-missing-merge-sources option to >>> also remove mergeinfo that predates the starting revision of the dump >>> stream. >> >> +1 all over this stuff. >> >>> 6) Open: How does svnadmin load deal with mergeinfo that doesn't exist >>> in the load steam or the target repos: >>> >>> A) It doesn't. >>> >>> B) Only checks that the revs are valid (fast, but >>> can allow mergeinfo that describes invalid >>> mergesource:rev pairs) >>> >>> C) Find an efficient way to test for valid >>> mergesource:rev pairs >> >> For now, A (or maybe B). Then we have our Paul back. :-) >> > > Been down the rabbit hole on-and-off the last couple weeks with this. > Did everything we discussed so far in this thread, plus a slew of new > bugs, see http://subversion.tigris.org/issues/show_bug.cgi?id=3020#desc15 > and onward. > > At any rate, after my recent work we are left with one common(?) use > case still broken: > > Loading a sequence of incremental dumps when the fist load in the > sequence is into a *non-empty* repostitory. > > This is tested in svnadmin_tests.py 20 'don't filter mergeinfo revs > from incremental dump', specifically see '# PART 4: Load a a series of > incremental dumps to an non-empty repository." > > Basically the problem is this: > > 1) Given repository ReposA with X revisions in it. > > Note: Previously this use case was also broken if ReposB > was empty to start, but this now works (assuming none of > the incremental dumps in step 3 were svndumpfiltered in > such a way as to remove/renumber revisions). > > 2) Given repository ReposB with Y revisions in it. > > 3) Incrementally (but fully) dump ReposA > > svnadmin dump ReposA -r0:100 > A.0.100.dump > svnadmin dump ReposA -r101:200 --incremental > A.101.200.dump > svnadmin dump ReposA -r201:300 --incremental > A.201.300.dump > . > . > . > svnadmin dump ReposA -rW:X --incremental > A.W.X.dump > > 4) Load the ReposA incremental dumps to ReposB > > svnadmin load ReposB --parent-dir /some/subtree < A.0.100.dump > svnadmin load ReposB --parent-dir /some/subtree < A.101.200.dump > svnadmin load ReposB --parent-dir /some/subtree < A.201.300.dump > . > . > . > svnadmin load ReposB --parent-dir /some/subtree < A.W.X.dump > > The problem arises when one of the incremental loads contains > references to mergeinfo source revisions that predate the start of the > dump. For example, say A.201.300.dump contains mergeinfo referencing > r82. When this is loaded into ReposB the reference should be changed > to r(82+Y) to reflect the fact that ReposB has Y revisions in it > before we ever started loading parts of ReposA. Currently no change > is made and the source rev stays at r82, which is obviously incorrect. > > The attached patch fixes this problem. It takes all mergeinfo which > predates the first revision in the dump stream and adjusts it by the > difference between the head rev in ReposB and the current dump stream > revision. Yes, this is a fairly fragile solution (as is our copyfrom > sources mapping, which this is based on). If unrelated commits are > made to ReposB between loads of the incremental dumps, then the > mergeinfo source revs will be wrong; but it is *always* wrong right > now, at least this would fix this use case. Opinions appreciated. > > And yes, there are still other potential problems: Think incremental > dumps that are dumpfiltered such that some revs are dropped or > renumbered. But I'm not sure we really want to solve *every* problem > in this space, I'm not even sure we can (without bringing the > performance of svnadmin load to its knees). I think this patch brings > us to "good enough" with the caveat that ill-advised use of svnadmin > dump/load and/or svndumpfilter can result in incorrect mergeinfo. > > Paul >
One last issue #3020 fix: Correctly map mergeinfo revisions when loading incremental dumps into a repository that was not empty prior to the first load. * subversion/include/private/svn_mergeinfo_private.h (svn_mergeinfo__adjust_mergeinfo_rangelists): New. * subversion/libsvn_subr/mergeinfo.c (svn_mergeinfo__adjust_mergeinfo_rangelists): New. * subversion/libsvn_repos/load.c (renumber_mergeinfo_revs): Adjust mergeinfo older than the oldest revision in the dump stream by the difference between the head rev of the target repository and the current dump stream rev. (new_revision_record): Update parse_baton.oldest_old_rev here, a bit sooner than we did before when we set it... (close_revision): ...here. * subversion/tests/cmdline/svnadmin_tests.py
Index: subversion/include/private/svn_mergeinfo_private.h =================================================================== --- subversion/include/private/svn_mergeinfo_private.h (revision 943236) +++ subversion/include/private/svn_mergeinfo_private.h (working copy) @@ -212,6 +212,21 @@ svn_boolean_t inheritable, apr_pool_t *result_pool); +/* Adjust in-place MERGEINFO's rangelists by OFFSET. If OFFSET is negative + and would adjust any part of MERGEINFO's source revisions to 0 or less, + then those revisions are dropped. If all the source revisions for a merge + source path are dropped, then the path itself is dropped. If all merge + source paths are dropped, then *ADJUSTED_MERGEINFO is set to an empty + hash. *ADJUSTED_MERGEINFO is allocated in RESULT_POOL. SCRATCH_POOL is + used for any temporary allocations. */ +svn_error_t * +svn_mergeinfo__adjust_mergeinfo_rangelists(svn_mergeinfo_t *adjusted_mergeinfo, + svn_mergeinfo_t mergeinfo, + svn_revnum_t offset, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + + #ifdef __cplusplus } #endif /* __cplusplus */ Index: subversion/libsvn_repos/load.c =================================================================== --- subversion/libsvn_repos/load.c (revision 943236) +++ subversion/libsvn_repos/load.c (working copy) @@ -278,11 +278,31 @@ apr_pool_t *pool) { apr_pool_t *subpool = svn_pool_create(pool); - apr_hash_t *mergeinfo; - apr_hash_t *final_mergeinfo = apr_hash_make(subpool); + svn_mergeinfo_t mergeinfo, predates_stream_mergeinfo; + svn_mergeinfo_t final_mergeinfo = apr_hash_make(subpool); apr_hash_index_t *hi; SVN_ERR(svn_mergeinfo_parse(&mergeinfo, initial_val->data, subpool)); + + if (rb->pb->oldest_old_rev > 1) + { + SVN_ERR(svn_mergeinfo__filter_mergeinfo_by_ranges( + &predates_stream_mergeinfo, mergeinfo, + rb->pb->oldest_old_rev - 1, 0, + TRUE, subpool, subpool)); + SVN_ERR(svn_mergeinfo__filter_mergeinfo_by_ranges( + &mergeinfo, mergeinfo, + rb->pb->oldest_old_rev - 1, 0, + FALSE, subpool, subpool)); + SVN_ERR(svn_mergeinfo__adjust_mergeinfo_rangelists( + &predates_stream_mergeinfo, predates_stream_mergeinfo, + -rb->rev_offset, subpool, subpool)); + } + else + { + predates_stream_mergeinfo = NULL; + } + for (hi = apr_hash_first(subpool, mergeinfo); hi; hi = apr_hash_next(hi)) { const char *merge_source; @@ -348,6 +368,11 @@ apr_hash_set(final_mergeinfo, merge_source, APR_HASH_KEY_STRING, rangelist); } + + if (predates_stream_mergeinfo) + SVN_ERR(svn_mergeinfo_merge(final_mergeinfo, predates_stream_mergeinfo, + subpool)); + SVN_ERR(svn_mergeinfo_sort(final_mergeinfo, subpool)); /* Mergeinfo revision sources for r0 and r1 are invalid; you can't merge r0 @@ -1053,6 +1078,10 @@ SVN_ERR(svn_stream_printf(pb->outstream, pool, _("<<< Started new transaction, based on " "original revision %ld\n"), rb->rev)); + + /* Stash the oldest "old" revision committed from the load stream. */ + if (!SVN_IS_VALID_REVNUM(pb->oldest_old_rev)) + pb->oldest_old_rev = rb->rev; } /* If we're parsing revision 0, only the revision are (possibly) @@ -1411,10 +1440,6 @@ return svn_error_return(err); } - /* Stash the oldest "old" revision committed from the load stream. */ - if (!SVN_IS_VALID_REVNUM(pb->oldest_old_rev)) - pb->oldest_old_rev = *old_rev; - /* Run post-commit hook, if so commanded. */ if (pb->use_post_commit_hook) { Index: subversion/libsvn_subr/mergeinfo.c =================================================================== --- subversion/libsvn_subr/mergeinfo.c (revision 943236) +++ subversion/libsvn_subr/mergeinfo.c (working copy) @@ -1998,6 +1998,58 @@ return SVN_NO_ERROR; } +svn_error_t * +svn_mergeinfo__adjust_mergeinfo_rangelists(svn_mergeinfo_t *adjusted_mergeinfo, + svn_mergeinfo_t mergeinfo, + svn_revnum_t offset, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_hash_index_t *hi; + *adjusted_mergeinfo = apr_hash_make(result_pool); + + if (mergeinfo) + { + for (hi = apr_hash_first(scratch_pool, mergeinfo); + hi; + hi = apr_hash_next(hi)) + { + int i; + const char *path = svn__apr_hash_index_key(hi); + apr_array_header_t *rangelist = svn__apr_hash_index_val(hi); + apr_array_header_t *adjusted_rangelist = + apr_array_make(result_pool, rangelist->nelts, + sizeof(svn_merge_range_t *)); + + for (i = 0; i < rangelist->nelts; i++) + { + svn_merge_range_t *range = + APR_ARRAY_IDX(rangelist, i, svn_merge_range_t *); + + if (range->start + offset > 0 && range->end + offset > 0) + { + if (range->start + offset < 0) + range->start = 0; + else + range->start = range->start + offset; + + if (range->end + offset < 0) + range->end = 0; + else + range->end = range->end + offset; + APR_ARRAY_PUSH(adjusted_rangelist, svn_merge_range_t *) = + range; + } + } + + if (adjusted_rangelist->nelts) + apr_hash_set(*adjusted_mergeinfo, apr_pstrdup(result_pool, path), + APR_HASH_KEY_STRING, adjusted_rangelist); + } + } + return SVN_NO_ERROR; +} + svn_boolean_t svn_mergeinfo__is_noninheritable(svn_mergeinfo_t mergeinfo, apr_pool_t *scratch_pool) Index: subversion/tests/cmdline/svnadmin_tests.py =================================================================== --- subversion/tests/cmdline/svnadmin_tests.py (revision 943236) +++ subversion/tests/cmdline/svnadmin_tests.py (working copy) @@ -1070,6 +1070,7 @@ load_and_verify_dumpstream(sbox, [], [], None, open(dump_file_r11_13).read(), '--ignore-uuid') + #raise svntest.Failure("PTB") load_and_verify_dumpstream(sbox, [], [], None, open(dump_file_r14_15).read(), '--ignore-uuid') @@ -1162,28 +1163,6 @@ # Check the resulting mergeinfo. We expect the exact same results # as Part 3. - # - # Currently this fails because our current logic mapping mergeinfo revs - # in the load stream to their new values based on the offset of the - # target repository is quite flawed. Right now this is the resulting - # mergeinfo: - # - # Properties on 'svnadmin_tests-21\Projects\Project-X\branches\B1\B - # svn:mergeinfo - # /Projects/Project-X/branches/B2/B/E:11-12 - # /Projects/Project-X/trunk/B/E:5-6,8-9 - # Properties on 'svnadmin_tests-21\Projects\Project-X\branches\B1': - # svn:mergeinfo - # /Projects/Project-X/branches/B2:11-18 - # ^^ - # The *only* correct rev here! - # /Projects/Project-X/trunk:6,9 - # Properties on 'svnadmin_tests-21\Projects\Project-X\branches\B2': - # svn:mergeinfo - # /Projects/Project-X/trunk:9 - # - # See http://subversion.tigris.org/issues/show_bug.cgi?id=3020#desc16 for - # more info. svntest.actions.run_and_verify_svn(None, expected_output, [], 'propget', 'svn:mergeinfo', '-R', sbox.repo_url) @@ -1215,7 +1194,7 @@ create_in_repo_subdir, SkipUnless(verify_with_invalid_revprops, svntest.main.is_fs_type_fsfs), - XFail(dont_drop_valid_mergeinfo_during_incremental_loads), + dont_drop_valid_mergeinfo_during_incremental_loads, ] if __name__ == '__main__':