stef...@apache.org wrote on Mon, Dec 26, 2011 at 23:37:25 -0000:
> Author: stefan2
> Date: Mon Dec 26 23:37:25 2011
> New Revision: 1224836
> 
> URL: http://svn.apache.org/viewvc?rev=1224836&view=rev
> Log:
> Tune FSFS deltification stratey: Use linear deltification on the very top of 
> the
> deltification history and skip-delta only for larger distances.  Most of the 
> runtime
> overhead is being counterbalanced by reduced I/O and masked by our membuffer
> caching.
> OTOH, we examine skip delta ranges linearly during commit. Therefore, runtime
> becomes an issue when committing nodes with very deep change histories.
> This patch simply limits the deltification history to some reasonable length.
> 
> * subversion/libsvn_fs_fs/fs_fs.c
>   (SVN_FS_FS_MAX_LINEAR_DELTIFICATION,
>    SVN_FS_FS_MAX_DELTIFICATION_WALK): new tuning parameters
>   (choose_delta_base): implement the new deltification strategy
> * notes/knobs
>   document the new defines
> 
> Modified:
>     subversion/trunk/notes/knobs
>     subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
> 
> Modified: subversion/trunk/notes/knobs
> URL: 
> http://svn.apache.org/viewvc/subversion/trunk/notes/knobs?rev=1224836&r1=1224835&r2=1224836&view=diff
> ==============================================================================
> --- subversion/trunk/notes/knobs (original)
> +++ subversion/trunk/notes/knobs Mon Dec 26 23:37:25 2011
> @@ -33,6 +33,8 @@ DEFAULT_HTTP_LIBRARY
>  MAX_SECS_TO_LINGER
>  SUFFIX_LINES_TO_KEEP
>  SVN_FS_FS_DEFAULT_MAX_FILES_PER_DIR
> +SVN_FS_FS_MAX_LINEAR_DELTIFICATION
> +SVN_FS_FS_MAX_DELTIFICATION_WALK
>  SVN_UNALIGNED_ACCESS_IS_OK
>  
>  2.2 Features
> @@ -125,7 +127,25 @@ SVN_I_LIKE_LATENCY_SO_IGNORE_HTTPV2
>    Range:     natural integers
>    Suggested: 1, 2, 3, 4, 5, 7, 11
>  
> -3.6 SVN_UNALIGNED_ACCESS_IS_OK
> +3.6 SVN_FS_FS_MAX_LINEAR_DELTIFICATION
> +
> +  Scope:     libsvn_fs_fs
> +  Purpose:   max length + 1 of the linear deltification history
> +             before skip-deltification kicks in
> +  Default:   16
> +  Range:     natural integers
> +  Suggested: 2, 4, 8, 16, 32, 64
> +
> +3.7 SVN_FS_FS_MAX_DELTIFICATION_WALK
> +
> +  Scope:     libsvn_fs_fs
> +  Purpose:   max skip deltification range. Change histories
> +             longer than that will be restarted with a fulltext.
> +  Default:   1023
> +  Range:     natural integers
> +  Suggested: 1, 2, 3, 4, 5, 7, 11
> +

The suggested values (and Range) for both of these are inconsistent with
the comments at the definitions (which want a power of two, or a power
of two minus one).

> +3.8 SVN_UNALIGNED_ACCESS_IS_OK
>  
>    Scope:     (global)
>    Purpose:   enable data accesss optimizations.
> 
> Modified: subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
> URL: 
> http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?rev=1224836&r1=1224835&r2=1224836&view=diff
> ==============================================================================
> --- subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c (original)
> +++ subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c Mon Dec 26 23:37:25 2011
> @@ -78,6 +78,20 @@
>  #define SVN_FS_FS_DEFAULT_MAX_FILES_PER_DIR 1000
>  #endif
>  
> +/* Begin deltification after a node history exceeded this this limit.
> +   Useful values are 4 to 64 with 16 being a good compromise between
> +   computational overhead and pository size savings.
> +   Should be a power of 2.  
> +   Values < 2 will result in standard skip-delta behavior. */
> +#define SVN_FS_FS_MAX_LINEAR_DELTIFICATION 16
> +
> +/* Finding a deltification base takes operations proportional to the
> +   number of changes being skipped. To prevent exploding runtime
> +   during commits, limit the deltification range to this value.
> +   Should be a power of 2 minus one.
> +   Values < 1 disable deltification. */
> +#define SVN_FS_FS_MAX_DELTIFICATION_WALK 1023
> +
>  /* Following are defines that specify the textual elements of the
>     native filesystem directories and revision files. */
>  
> @@ -5186,6 +5200,7 @@ choose_delta_base(representation_t **rep
>                    apr_pool_t *pool)
>  {
>    int count;
> +  int walk;
>    node_revision_t *base;
>  
>    /* If we have no predecessors, then use the empty stream as a
> @@ -5203,6 +5218,23 @@ choose_delta_base(representation_t **rep
>    count = noderev->predecessor_count;
>    count = count & (count - 1);
>  
> +  /* We use skip delta for limiting the number of delta operations 
> +     along very long node histories.  Close to HEAD however, we create
> +     a linear history to minimize delta size.  */
> +  walk = noderev->predecessor_count - count;
> +  if (walk < SVN_FS_FS_MAX_LINEAR_DELTIFICATION)
> +    count = noderev->predecessor_count - 1;
> +

So, before this change the number of delta steps of the Nth revision was
equal the number of 1-bits in the binary representation of
predecessor_count, with this change it's equal to (predecessor_count
& 15) + (the number of 1-bits in the binary representation of
(predecessor_count >> 4)), right?

In other words, this change doesn't affect the high level structure of
the deltas tree (except it makes each leaf a 15-deltas chain), but it
trades disk space savings for more delta combinations in historical cats
(eg, revision 16n+8 will need X+8 deltas, instead of X+1.)

Are there scenarios in which this change is significant?  What about
people who store large binaries that change frequently in svn, should we
be recommending them to set SVN_FS_FS_MAX_LINEAR_DELTIFICATION to
a small number?  Should we be making these fsfs.conf settings, instead
of compile-time macros?

> +  /* Finding the delta base over a very long distance can become extremely
> +     expensive for very deep histories, possibly causing client timeouts etc.
> +     OTOH, this is a rare operation and its gains are minimal. Lets simply
> +     start deltification anew close every other 1000 changes or so.  */
> +  if (walk > SVN_FS_FS_MAX_DELTIFICATION_WALK)
> +    {
> +      *rep = NULL;
> +      return SVN_NO_ERROR;
> +    }
> +
>    /* Walk back a number of predecessors equal to the difference
>       between count and the original predecessor count.  (For example,
>       if noderev has ten predecessors and we want the eighth file rev,
> 
> 

Reply via email to