Re: svn commit r1435746 - svn_fs_verify() - issues #3995, #4211

Julian Foad Wed, 23 Jan 2013 09:57:22 -0800

> URL: http://svn.apache.org/viewvc?rev=1435746&view=rev


> Log:
> Fix issues #3995 (redesign svn_fs_verify() for 1.8) and #4211 (verify
> is slow and needs to handle node verification better).
> 
> This patch does two things. First, it adds a notification callback
> to svn_fs_verify() and forwards its output to our console stream.
> 
> The second change is that revision-specific checks are begin moved
> from svn_fs_fs__verify() to a new svn_fs_verify_rev() function.
> Because the latter is being called as part of the per-revision checks,
> progress is visible and access / cache locality is very much improved.

The latter is now being called as part of svn_repos_verify_fs2()'s per-revision 
checking loop (not as part of svn_fs_[fs__]verify()'s per-rev checks, which is 
how I interpreted this at first).

Because of that, the already existing '* Verified revision R' notification from 
that loop now means both 'dump revision R' and svn_fs_verify_rev(R) have been 
done.  There will no longer be the initial long delay which was previously 
caused by svn_fs_verify() verifying all revs in the repo before the first '* 
Verified revision R' notification.

But you are still calling svn_fs_verify(..., start, end, ...) before the 
per-rev loop.  This was puzzling until I saw that the 'start' and 'end' 
arguments here are no longer used for per-rev full verification but are still 
used for rep-cache verification.  This is now the only potentially long-running 
check before the main loop starts, and it now sends its own notifications.

So the overall output is now like this (on one of my test repos):
* Verifying global structure ...
* Verifying structure at revision 74 ...
* Verifying structure at revision 481 ...
* Verifying structure at revision 831 ...
* Verified revision 0.
* Verified revision 1.
* Verified revision 2.
* Verified revision 3.

OK.  Maybe s/structure/repository metadata/; what do others think?

> /**
>  * Perform backend-specific data consistency and correctness validations
>  * to the Subversion filesystem located in the directory @a path.
>  *
>  * @a start and @a end may be #SVN_INVALID_REVNUM, in which case
>  * svn_repos_verify_fs2()'s semantics apply.  When @c r0 is being
>  * verified, global invariants may be verified as well.
>  * ... */
> svn_fs_verify(path, cancel..., notify..., start, end, pool);

This needs to say what the START and END parameters are for, and make clear 
that it doesn't call svn_fs_verify_rev() for each rev.

> /**
>  * Perform backend-specific data consistency and correctness validations
>  * to revision @a revision of the Subversion filesystem @a fs.
>  * ... */
> svn_fs_verify_rev(fs, revision, pool);

I'm still unclear of the relationship between svn_fs_verify_rev() and 
performing a dump of the rev.  I assume they are complementary, but it seems 
odd, as it feels to me like a good implementation of svn_fs_verify_rev() would 
do all that a dump does and more.


> /* Verify the fsfs filesystem FS.  Use POOL for temporary allocations. */
> svn_fs_fs__verify(fs, cancel..., notify..., start, end, pool);
> 
> /* Verify REVISION in filesystem FS.  Use POOL for temporary allocations. */
> svn_fs_fs__verify_rev(fs, revision, pool);
> 
> /* Verify metadata for ROOT.
>    ### Currently only implemented for revision roots. */
> svn_fs_fs__verify_root(svn_fs_root_t *root, pool);

Similar docs are needed here too, or a cross-reference such as "Implements 
svn_fs_verify...() for FSFS".


The order of parameters in svn_fs_verify() and the corresponding vtable 
function and similar places  would be more consistent with elsewhere if 
rearranged as:
  (main params,
   notification,
   cancellation,
   pools).


> Modified: subversion/trunk/subversion/libsvn_fs_fs/rep-cache.c
> ==============================================================================
> --- subversion/trunk/subversion/libsvn_fs_fs/rep-cache.c (original)
> +++ subversion/trunk/subversion/libsvn_fs_fs/rep-cache.c Sat Jan 19 22:45:02
> @@ -135,6 +135,8 @@ svn_fs_fs__walk_rep_reference(svn_fs_t *
>                                void *walker_baton,
>                                svn_cancel_func_t cancel_func,
>                                void *cancel_baton,
> +                              svn_fs_progress_notify_func_t notify_func,
> +                              void *notify_baton,
>                                svn_revnum_t start,
>                                svn_revnum_t end,
>                                apr_pool_t *pool)
> @@ -164,6 +167,9 @@ svn_fs_fs__walk_rep_reference(svn_fs_t *
>        SVN_ERR(svn_sqlite__reset(stmt));
>        if (SVN_IS_VALID_REVNUM(max))  /* The rep-cache could be empty. */
>          SVN_ERR(svn_fs_fs__revision_exists(max, fs, iterpool));
> +
> +      if (notify_func)
> +        notify_func(SVN_INVALID_REVNUM, notify_baton, iterpool);
>      }
> 
>    SVN_ERR(svn_sqlite__get_statement(&stmt, ffd->rep_cache_db,
> @@ -210,6 +216,16 @@ svn_fs_fs__walk_rep_reference(svn_fs_t *
>          return svn_error_compose_create(err, svn_sqlite__reset(stmt));
> 
>        SVN_ERR(svn_sqlite__step(&have_row, stmt));
> +
> +      /* Notify (occasionally, because walking is fast and we can't
> +         guarantee a properly ordered notification sequence anyway) */
> +      if (   notify_func
> +          && (iterations % 1024 == 0)
> +          && (rep->revision != last_notified_revision))
> +        {
> +          notify_func(rep->revision, notify_baton, iterpool);
> +          last_notified_revision = rep->revision;
> +        }
>      }

This function already calls a callback as its main purpose.  We don't need to 
add a separate notification 
mechanism so it can tell us that it's called the callback: we can implement 
whatever notification we want, *inside* the 
existing callback.

There's only one special case here, which is you put a notification in after it 
does the initial global invariant.  That's not really needed, as the caller 
knows it's about to be done when it requests start==0, and knows it's been done 
when it gets the first callback.  (If you really want to make a call at this 
point, it would be equally OK to call the main callback with REP=NULL.)

Advantages: (1) simplicity; (2) the *user* of the function controls how often 
it notifies: it might want to do so on the basis of elapsed time rather than 
number of loop iterations for example.


- Julian

Re: svn commit r1435746 - svn_fs_verify() - issues #3995, #4211

Reply via email to