On Tue, Jun 17, 2014 at 6:31 PM, Evgeny Kotkov <evgeny.kot...@visualsvn.com> wrote:
> Hi, > > Here is something I was working on at the hackathon. This is the problem > I reported in http://svn.haxx.se/dev/archive-2014-02/0187.shtml > In brief, the problem lies in the way we use the recovery code when > hotcopying > these old repositories. We grab the YOUNGEST revision from the source, do > the > hotcopy and then use the recovery procedure in order to determine the > node-id > and the copy-id to stamp in the destination db/current file. The problem > is > that proper recovery requires a *full* scan of the FS, which we do not do. > As a consequence, the db/current in the destination is messed up and that > is > shown by the XFail'ing fsfs_hotcopy_old_with_propchanges() test. > Yes, the problem is real: HEAD might not touch a path that belongs to the latest copy / copied sub-tree. Hence, we won't catch the highest copy ID by simply looking at the youngest revision. Anyway, this "use recover to obtain db/current contents" approach seems > broken > from the very start. I would like to fix this problem by switching to > something more obvious. We could atomically read the (YOUNGEST, NODE-ID, > COPY-ID) tuple from the destination and write it to the destination at the > very moment the hotcopy finished working. Here is a series of three > patches, > which I am willing to commit one-by-one. Please note that *I did not yet > write > the log messages* for this patch series (probably they are going to include > parts of what I've written below). I am working on them at this very > moment. > And sorry for a lot of text :) > > In brief, I have the following plan: > > - Extend the existing fsfs_hotcopy_old_with_propchanges() test. We now > know > that the problem is not actually in the property changes, but in the > node- > and copy-id changes. We can extend the test (which would still be an > @XFail on trunk) in order to test *more*. > This is the first patch of the series. > extend-the-existing-xfail-test v1 looks good to commit. > - Fix the problem itself. The idea of the fix is pretty simple -- we stop > using the recover()-related functions when doing the hotcopy for > repositories > which have the old FS format. Previously we did use the recovery > mechanism > in order to recover the next-node-id and the next-copy-id for the hotcopy > target, but did it *wrong*: > > * The thing is that proper recovery actually needs a full rev-by-rev scan > of the filesystem, otherwise it might actually miss some of the next- > or > copy-ids. That's exactly what we encountered here, because now on > trunk > we run the recovery *only* for the youngest revision of the destination > and hence are messing the node ids in the target 'current' file. > > * We could probably switch to a full recovery scan for this case (that > would be simpler in the patch terms), but that actually sucks. You do > not want to run a full recovery scan in order for the backup (!) to > work. > So, I've fixed this by atomically reading the 'db/current' contents of > the > source and updating it as-is in the destination. The diff might look a > bit scary, but this is just a consequence of me extracting the revision > copying logic into two separate functions (for old and new FS formats > accordingly). > So, in effect, we are actually doing *less* work than before. I like. > The new logic is pretty simple -- if we encounted an FS with a new > format, > just stick to the old code. If we see the FS format 1 or 2, we use > the new > approach, which is much simpler. We do not have to deal with the > packs and > shards and stuff like that and simply copy the revisions and revprops ( > sticking to the old approach which copies only files which differ in > terms > of filestamps). Then we atomically update the 'db/current' in the > destination (we've atomically read the source 'db/current' contents > just > a bit earlier). > > This is the second patch of the series. > That patch in itself is o.k.. However, you should update get_next_revision_ids() in transaction.c to use the new utility function (or entirely replace it). > - Cleanup a bit. Now, because the hotcopy code does not need the recovery > code (which seems logical), we can remove the corresponding private API, > namely svn_fs_fs__find_max_ids(). As a consequence, the id recovery code > now is "private" in the subversion/libsvn_fs_fs/recovery.c file and that > makes sense from the design point of view. > > This is the third patch of the series. > Removing the internal API is fine; however you might have kept the code as a static function without the svn_fs_fs__ name prefix. There is no need to manually inline it into the caller - but that's just me giving an opinion. I'd be fine with committing that patch as is. > In case this might be interesting to you, here is a checklist I've run > against > the trunk with all three patches applied: > > - Run svnadmin_tests.py with Debug / Windows > - Run svnadmin_tests.py with Debug and --fsfs-packing / Windows > - Run svnadmin_tests.py with Debug and --fsfs-sharding / Windows > - Run svnadmin_tests.py with Debug and --server-minor-version=3 / Windows > - Run svnadmin_tests.py with Debug and --server-minor-version=6 / Windows > - Run svnadmin_tests.py with Debug and --server-minor-version=8 / Windows > - Run svnadmin_tests.py with Release / Windows > - Run svnadmin_tests.py with Release and --fs-type=bdb / Windows > - Run svnadmin_tests.py with Release and --fs-type=fsx / Windows > - Run svnadmin_tests.py with Release and --parallel / Windows > - Run svnadmin_tests.py with Release and --http-dir / Windows > - Run all tests / Ubuntu > - Run hotcopy for the new format serf repository, verify it, check > db/current > - Run hotcopy for the new format googletest repository, verify it, > check db/current > - Run hotcopy for the old format serf repository, verify it, check > db/current > - Run hotcopy for the old format googletest repository, verify it, > check db/current > - Run hotcopy for the PACKED new format serf repository, verify it, > check db/current > - Run hotcopy for the PACKED new format googletest repository, verify it, > check db/current > - Run hotcopy for the PACKED old format serf repository, verify it, > check db/current > - Run hotcopy for the PACKED old format googletest repository, verify it, > check db/current > - Somehow test the incremental hotcopy (say, abort the hotcopy for a > sharded repository in the middle of it...) > - Do the previous test with packed shards > - Test that the test with changes still fails on trunk (itself) > - Patch the test in order to run it with a normal repository > version (> 3) / Windows > > What do you think? > Wow my total response is shorter than the mere list of tests you ran ;) Good job. Cheers! -- Stefan^2.