[email protected] wrote on Tue, Aug 17, 2010 at 19:04:21 -0000:
> Author: stefan2
> Date: Tue Aug 17 19:04:21 2010
> New Revision: 986453
> 
> URL: http://svn.apache.org/viewvc?rev=986453&view=rev
> Log:
> Speed up EOF and keyword translation in two ways.
> 
> First, don't translate EOFs if the "is" equals the "ought". 
> This reduces the number of calls to translate_newline as 
> well as translate_write and ultimately apr_file_write.
> 
> Second, optimize the memcspn() replacement implementation
> by checking 4 bytes at once. That results in about 3 CPU 
> ticks / byte instead of roughly 8 to 10 ticks before.
> 
> * subversion/libsvn_delta/subst.c
>   (translation_baton): add optimization info field
>   (create_translation_baton): initialize the new member
>   (eol_unchanged): new utiltiy function to compare EOLs
>   (translate_chunk): faster scanning for interesting chars;
>    don't translate unchaning EOLs
> 
> Modified:
>     subversion/branches/performance/subversion/libsvn_subr/subst.c
> 
> Modified: subversion/branches/performance/subversion/libsvn_subr/subst.c
> URL: 
> http://svn.apache.org/viewvc/subversion/branches/performance/subversion/libsvn_subr/subst.c?rev=986453&r1=986452&r2=986453&view=diff
> ==============================================================================
> --- subversion/branches/performance/subversion/libsvn_subr/subst.c (original)
> +++ subversion/branches/performance/subversion/libsvn_subr/subst.c Tue Aug 17 
> 19:04:21 2010
> @@ -839,6 +844,33 @@ create_translation_baton(const char *eol
>    return b;
>  }
>  
> +/* Return TRUE if the EOL starting at BUF matches the eol_str member of B.
> + * Be aware of special cases like "\n\r\n" and "\n\n\r". For sequences like
> + * "\n$", the result will be FALSE since it is more efficient to handle
> + * that special case implicitly in the calling code by exiting the quick
> + * scan loop.
> + */
> +static APR_INLINE svn_boolean_t
> +eol_unchanged(struct translation_baton *b,
> +              const char *buf)
> +{

I've tweaked the docstring for a bit of extra clarity, in r1029092.

> +  /* if the first byte doesn't match, the whole EOL won't */
> +  if (buf[0] != b->eol_str[0])
> +    return FALSE;
> +
> +  /* two-char EOLs must be a full match */
> +  if (b->eol_str_len == 2)
> +    return buf[1] == b->eol_str[1];
> +
> +  /* The first char matches the required 1-byte EOL. 
> +   * But maybe, buf[] contains a 2-byte EOL?
> +   * In that case, the second byte will be interesting
> +   * and not be another EOL of its own.
> +   */
> +  return !b->interesting[(unsigned char)buf[1]] || buf[0] == buf[1];
> +}
> +

So, this function returns TRUE if BUF starts with \n and we want \n, or
with \r\n and we want \r\n, or --- when we want \r --- starts with \r
and followed by neither \n nor $.

The single caller ensures that buf[1] is not EOF (and hence is safe to
use as an index into interesting[]), but the docstring should state
that (as a precondition).

> @@ -944,19 +976,71 @@ translate_chunk(svn_stream_t *dst,
>                continue;
>              }
>  
> -          /* We're in the boring state; look for interest characters. */
> -          len = 0;
> +          /* translate_newline will modify the baton for src_format_len==0
> +             or may return an error if b->repair is FALSE.  In all other
> +             cases, we can skip the newline translation as long as source
> +             EOL format and actual EOL format match.  If there is a 
> +             mismatch, translate_newline will be called regardless of 
> +             nl_translation_skippable. 
> +           */
> +          if (b->nl_translation_skippable == svn_tristate_unknown &&
> +              b->src_format_len > 0)
> +            {
> +              /* test whether translate_newline may return an error */
> +              if (b->eol_str_len == b->src_format_len &&
> +                  strncmp(b->eol_str, b->src_format, b->eol_str_len) == 0)
> +                b->nl_translation_skippable = svn_tristate_true;
> +              else if (b->repair) 
> +                b->nl_translation_skippable = svn_tristate_true;
> +              else
> +                b->nl_translation_skippable = svn_tristate_false;
> +            }
> +
> +          /* We're in the boring state; look for interest characters.
> +             Offset len such that it will become 0 in the first iteration. 
> +           */
> +          len = 0 - b->eol_str_len;
> +
> +          /* Look for the next EOL (or $) that actually needs translation.
> +             Stop there or at EOF, whichever is encountered first.
> +           */
> +          do
> +            {
> +              /* skip current EOL */
> +              len += b->eol_str_len;
> +
> +              /* check 4 bytes at once to allow for efficient pipilening
> +                 and to reduce loop condition overhead. */
> +              while ((p + len + 4) <= end)
> +                {
> +                  char sum = interesting[(unsigned char)p[len]]
> +                           | interesting[(unsigned char)p[len+1]]
> +                           | interesting[(unsigned char)p[len+2]]
> +                           | interesting[(unsigned char)p[len+3]];
> +
> +                  if (sum != 0)
> +                    break;
> +
> +                  len += 4;
> +                }
> +
> +               /* Found an interesting char or EOF in the next 4 bytes. 
> +                  Find its exact position. */
> +               while ((p + len) < end && !interesting[(unsigned char)p[len]])
> +                 ++len;
> +            }
> +          while (b->nl_translation_skippable &&   /* can skip EOLs at all */
> +                 p + len + 2 < end &&             /* not too close to EOF */
> +                 eol_unchanged (b, p + len));     /* EOL format already ok */
>  

Haven't read the whole hunk yet, but this jumped to my eye:

Since nl_translation_skippable is an svn_tristate_t, shouldn't this test
explicitly for 'svn_tristate_true'?  (Otherwise, the test would evaluate
to true even for svn_tristate_false...)

Perhaps we should name the variable in a way that indicates its
non-booleanness --- e.g., tri_nl_translation_skippable --- ?

> -          /* We wanted memcspn(), but lacking that, the loop below has
> -             the same effect. Also, skip NUL characters.
> -          */
>            while ((p + len) < end && !interesting[(unsigned char)p[len]])
>              len++;
>  
>            if (len)
> -            SVN_ERR(translate_write(dst, p, len));
> -
> -          p += len;
> +            {
> +              SVN_ERR(translate_write(dst, p, len));
> +              p += len;
> +            }
>  
>            /* Set up state according to the interesting character, if any. */
>            if (p < end)
> 
> 

Reply via email to