string.c

Stefan Fuhrmann Mon, 11 Aug 2014 09:26:28 -0700

On Mon, Aug 4, 2014 at 2:53 PM, Ivan Zhakov <i...@visualsvn.com> wrote:


> On 30 July 2014 20:27,  <stef...@apache.org> wrote:
> > Author: stefan2
> > Date: Wed Jul 30 16:27:44 2014
> > New Revision: 1614703
> >
> > URL: http://svn.apache.org/r1614703
> > Log:
> > On the authzperf branch:  Implement support for generic / complex
> > wildcard patterns like "/foo*bar*baz/".
> >
> > To do this efficiently requires normalized pattern paths, so we
> > add the normalization logic as described here:
> > https://wiki.apache.org/subversion/AuthzImprovements
> >
> Stefan,
>
> Please find my comments inline.
>

[...]

> +normalize_wildcards(const char *path,
> > +                    apr_pool_t *result_pool)
> > +{
> > +  const char *pos;
> > +  svn_stringbuf_t *buffer = svn_stringbuf_create(path, result_pool);
> > +
> > +  /* Reduce sequences variable-length segment matches to single segment
> > +   * matches with the other segment patterns reduced to "*". */
> > +  while (svn_stringbuf_replace_all(buffer, "/**/**/", "/*/**/"))
> > +    ;
> > +
> > +  /* Our tree traversal is more efficient if we put variable segment
> count
> > +   * wildcards last. */
> > +  while (svn_stringbuf_replace_all(buffer, "/**/*/", "/**/*/"))
> > +    ;
> > +
> > +  /* Reduce trailing "**" a single "*". */
> > +  while (   buffer->len > 1
> > +      && buffer->data[buffer->len-1] == '*'
> > +      && buffer->data[buffer->len-2] == '*')
> > +    svn_stringbuf_remove(buffer, buffer->len-1, 1);
> > +
> > +  /* Reduce "**" _inside_ a segment to a single "*". */
> > +  for (pos = strstr(buffer->data, "**"); pos; pos = strstr(pos, "**"))
> > +    if ((pos > buffer->data && pos[-1] != '/') || (pos[2] != '/'))
> > +      svn_stringbuf_remove(buffer, pos - buffer->data, 1);
> > +    else
> > +      ++pos;
> Quick quiz: I'm only one dev@ list who do not understand magic peace
> of code above?


I honestly hope you are. Nice typo, though ;)


> I'm fine if majority of Subversion devs do not consider
> such code as cryptic unreviewable magic.
>

[...]


> > +static svn_boolean_t
> > +match_pattern(const char *s,
> > +              const char *pattern)
> > +{
> > +  /* Matching a wildcard pattern is trival:
> > +   * PATTERN can be considered a list of literal strings separated by
> '*'.
> > +   * We simply have to find all sub-strings in that order, i.e. we can
> do
> > +   * so greedily.  Be careful to match prefix and suffix correctly.
> > +   */
> > +  char pattern_char, s_char;
> > +
> > +  /* The prefix part of PATTERN needs special treatment as we can't just
> > +   * match any substring of S. */
> > +  if (pattern[0] != '*')
> You'are using inconsitent style to access first char in pattern:
> pattern[0] and *pattern. It makes code review harder. Why you're doing
> this way?
>

You are right that both are interchangeable.
However, I prefer the index notation when treating
something as a string and the pointer notation when
it is (moving) pointer. Since both are the same type
in C, there is some gray area where I would useither.


>
> > +    {
> > +      apr_size_t match_len;
> > +
> > +      /* match_to_next_wildcard() assumes a that the first char
> matches. */
> > +      if (pattern[0] != s[0])
> > +        return FALSE;
> > +
> > +      /* Match up to but not beyond the next wildcard. */
> > +      match_len = match_to_next_wildcard(s, pattern);
> > +      if (match_len == 0)
> > +        return FALSE;
> > +
> > +      /* Continue at next wildcard or end-of-string. */
> > +      pattern += match_len;
> > +      s += match_len;
> > +    }
> > +
> > +  /* Process all of PATTERN and match it against S char by char. */
> > +  while ((pattern_char = *pattern))
> > +    {
> > +      apr_size_t match_len = 0;
> > +
> > +      /* If PATTERN ended on a wildcard, S can be nothing but a match.
> */
> > +      pattern_char = *++pattern;
> You're assigning patern_char in while() condition and in while body.
> What is the point for such code?
>

That's a remnant of an older version of this function.


> I'm also find expression like "pattern_char = *++pattern" very hard to
> follow.
>

[...]

As I learnt recently, there is an APR function that already covers
all the odd pattern matching that we need here. Since r1617148.
the above code has been completely removed.


>
> > +apr_size_t
> > +svn_stringbuf_replace_all(svn_stringbuf_t *str,
> > +                          const char *to_find,
> > +                          const char *replacement)
> > +{
> Please add tests for this function. Code doesn't seems trivial and
> does potential dangerous things like direct manipulation of stringbuf
> using memcpy.
>

Done in r1617144.

>
> > +  apr_size_t replacements = 0;
> > +
> > +  apr_size_t current = 0;
> > +  apr_size_t original_length = str->len;
> > +
> > +  apr_size_t to_copy;
> > +  apr_size_t to_find_len;
> > +  apr_size_t replacement_len;
> > +  apr_size_t new_length;
> > +
> > +  /* Early exit. */
> > +  const char *pos = strstr(str->data, to_find);
> > +  if (pos == NULL)
> > +    return 0;
> > +
> > +  to_find_len = strlen(to_find);
> > +  replacement_len = strlen(replacement);
> > +
> > +  /* We will store the new contents behind the NUL terminator of the
> current
> > +   * data and track the total length in STR->LEN to make the
> reallocation
> > +   * code preserve both bits.  However, we need to keep the NUL between
> them
> > +   * to make strstr stop at that boundary. */
> Why you've implemented so complicated algorithm instread of calling
> existing svn_stringbuf_replace() for every match?
>

O(N) vs. O(N^2). There is certainly a lot of further tuning
potential in various cases (to_find_len == replacement_len etc.)
but the current implementation has decent performance and
good worst-case behaviour.

You're still reallocating buffer multiple times and moving data
> arround, so I doubt that it significantly more performant that simple
> implementation based on svn_stringbuf_replace().
>

You are mistaken. svn_stringbuf_t has O(N) guarantees for
growing to string length N.

[...]


> I'm also +1 Brane comment about using maintaner mode and do not commit
> code with compilation warnings.
>

I accidentally hadn't activated it for that branch (I often switch
between configuration options). Once activated it showed
exactly two warnings that were not deprecations. So, on my
machine, the code *is* clean.

To get any of the size conversion warnings that I fix occasionally,
I have to switch to macos.

-- Stefan^2.

Re: svn commit: r1614703 - in /subversion/branches/authzperf: BRANCH-README subversion/include/svn_string.h subversion/libsvn_repos/authz.c subversion/libsvn_subr/string.c

Reply via email to