Re: [PATCH V1] Fix the hotcopy messing up the destination db/current for old repositories

Stefan Fuhrmann Wed, 18 Jun 2014 03:28:24 -0700

On Tue, Jun 17, 2014 at 6:31 PM, Evgeny Kotkov <evgeny.kot...@visualsvn.com>
wrote:


> Hi,
>
> Here is something I was working on at the hackathon.  This is the problem
> I reported in http://svn.haxx.se/dev/archive-2014-02/0187.shtml
> In brief, the problem lies in the way we use the recovery code when
> hotcopying
> these old repositories.  We grab the YOUNGEST revision from the source, do
> the
> hotcopy and then use the recovery procedure in order to determine the
> node-id
> and the copy-id to stamp in the destination db/current file.  The problem
> is
> that proper recovery requires a *full* scan of the FS, which we do not do.
> As a consequence, the db/current in the destination is messed up and that
> is
> shown by the XFail'ing fsfs_hotcopy_old_with_propchanges() test.
>

Yes, the problem is real: HEAD might not touch a path
that belongs to the latest copy / copied sub-tree. Hence,
we won't catch the highest copy ID by simply looking
at the youngest revision.

Anyway, this "use recover to obtain db/current contents" approach seems
> broken
> from the very start.  I would like to fix this problem by switching to
> something more obvious.  We could atomically read the (YOUNGEST, NODE-ID,
> COPY-ID) tuple from the destination and write it to the destination at the
> very moment the hotcopy finished working.  Here is a series of three
> patches,
> which I am willing to commit one-by-one.  Please note that *I did not yet
> write
> the log messages* for this patch series (probably they are going to include
> parts of what I've written below).  I am working on them at this very
> moment.
> And sorry for a lot of text :)
>
> In brief, I have the following plan:
>
> - Extend the existing fsfs_hotcopy_old_with_propchanges() test.  We now
> know
>   that the problem is not actually in the property changes, but in the
> node-
>   and copy-id changes.  We can extend the test (which would still be an
>   @XFail on trunk) in order to test *more*.
>

  This is the first patch of the series.
>

 extend-the-existing-xfail-test v1 looks good to commit.


> - Fix the problem itself.  The idea of the fix is pretty simple -- we stop
>   using the recover()-related functions when doing the hotcopy for
> repositories
>   which have the old FS format.  Previously we did use the recovery
> mechanism
>   in order to recover the next-node-id and the next-copy-id for the hotcopy
>   target, but did it *wrong*:
>
>   * The thing is that proper recovery actually needs a full rev-by-rev scan
>     of the filesystem, otherwise it might actually miss some of the next-
> or
>     copy-ids.  That's exactly what we encountered here, because now on
> trunk
>     we run the recovery *only* for the youngest revision of the destination
>     and hence are messing the node ids in the target 'current' file.
>
>   * We could probably switch to a full recovery scan for this case (that
>     would be simpler in the patch terms), but that actually sucks.  You do
>     not want to run a full recovery scan in order for the backup (!) to
> work.
>     So, I've fixed this by atomically reading the 'db/current' contents of
> the
>     source and updating it as-is in the destination.  The diff might look a
>     bit scary, but this is just a consequence of me extracting the revision
>     copying logic into two separate functions (for old and new FS formats
>     accordingly).
>

So, in effect, we are actually doing *less* work than before.
I like.


>     The new logic is pretty simple -- if we encounted an FS with a new
> format,
>     just stick to the old code.  If we see the FS format 1 or 2, we use
> the new
>     approach, which is much simpler.  We do not have to deal with the
> packs and
>     shards and stuff like that and simply copy the revisions and revprops (
>     sticking to the old approach which copies only files which differ in
> terms
>     of filestamps).  Then we atomically update the 'db/current' in the
>     destination (we've atomically read the source 'db/current' contents
> just
>     a bit earlier).
>
>   This is the second patch of the series.
>

That patch in itself is o.k.. However, you should update
get_next_revision_ids() in transaction.c to use the new
utility function (or entirely replace it).


> - Cleanup a bit.  Now, because the hotcopy code does not need the recovery
>   code (which seems logical), we can remove the corresponding private API,
>   namely svn_fs_fs__find_max_ids().  As a consequence, the id recovery code
>   now is "private" in the subversion/libsvn_fs_fs/recovery.c file and that
>   makes sense from the design point of view.
>
>   This is the third patch of the series.
>

Removing the internal API is fine; however you might have
kept the code as a static function without the svn_fs_fs__
name prefix. There is no need to manually inline it into the
caller - but that's just me giving an opinion. I'd be fine with
committing that patch as is.


> In case this might be interesting to you, here is a checklist I've run
> against
> the trunk with all three patches applied:
>
> - Run svnadmin_tests.py with Debug / Windows
> - Run svnadmin_tests.py with Debug and --fsfs-packing / Windows
> - Run svnadmin_tests.py with Debug and --fsfs-sharding / Windows
> - Run svnadmin_tests.py with Debug and --server-minor-version=3 / Windows
> - Run svnadmin_tests.py with Debug and --server-minor-version=6 / Windows
> - Run svnadmin_tests.py with Debug and --server-minor-version=8 / Windows
> - Run svnadmin_tests.py with Release / Windows
> - Run svnadmin_tests.py with Release and --fs-type=bdb / Windows
> - Run svnadmin_tests.py with Release and --fs-type=fsx / Windows
> - Run svnadmin_tests.py with Release and --parallel / Windows
> - Run svnadmin_tests.py with Release and --http-dir / Windows
> - Run all tests / Ubuntu
> - Run hotcopy for the new format serf repository, verify it, check
> db/current
> - Run hotcopy for the new format googletest repository, verify it,
>   check db/current
> - Run hotcopy for the old format serf repository, verify it, check
> db/current
> - Run hotcopy for the old format googletest repository, verify it,
>   check db/current
> - Run hotcopy for the PACKED new format serf repository, verify it,
>   check db/current
> - Run hotcopy for the PACKED new format googletest repository, verify it,
>   check db/current
> - Run hotcopy for the PACKED old format serf repository, verify it,
>   check db/current
> - Run hotcopy for the PACKED old format googletest repository, verify it,
>   check db/current
> - Somehow test the incremental hotcopy (say, abort the hotcopy for a
>   sharded repository in the middle of it...)
> - Do the previous test with packed shards
> - Test that the test with changes still fails on trunk (itself)
> - Patch the test in order to run it with a normal repository
>   version (> 3) / Windows
>
> What do you think?
>

Wow my total response is shorter than the mere list of tests
you ran ;) Good job.

Cheers!

-- Stefan^2.

Re: [PATCH V1] Fix the hotcopy messing up the destination db/current for old repositories

Reply via email to