On Mon, Aug 4, 2014 at 2:53 PM, Ivan Zhakov <i...@visualsvn.com> wrote:
> On 30 July 2014 20:27, <stef...@apache.org> wrote: > > Author: stefan2 > > Date: Wed Jul 30 16:27:44 2014 > > New Revision: 1614703 > > > > URL: http://svn.apache.org/r1614703 > > Log: > > On the authzperf branch: Implement support for generic / complex > > wildcard patterns like "/foo*bar*baz/". > > > > To do this efficiently requires normalized pattern paths, so we > > add the normalization logic as described here: > > https://wiki.apache.org/subversion/AuthzImprovements > > > Stefan, > > Please find my comments inline. > [...] > +normalize_wildcards(const char *path, > > + apr_pool_t *result_pool) > > +{ > > + const char *pos; > > + svn_stringbuf_t *buffer = svn_stringbuf_create(path, result_pool); > > + > > + /* Reduce sequences variable-length segment matches to single segment > > + * matches with the other segment patterns reduced to "*". */ > > + while (svn_stringbuf_replace_all(buffer, "/**/**/", "/*/**/")) > > + ; > > + > > + /* Our tree traversal is more efficient if we put variable segment > count > > + * wildcards last. */ > > + while (svn_stringbuf_replace_all(buffer, "/**/*/", "/**/*/")) > > + ; > > + > > + /* Reduce trailing "**" a single "*". */ > > + while ( buffer->len > 1 > > + && buffer->data[buffer->len-1] == '*' > > + && buffer->data[buffer->len-2] == '*') > > + svn_stringbuf_remove(buffer, buffer->len-1, 1); > > + > > + /* Reduce "**" _inside_ a segment to a single "*". */ > > + for (pos = strstr(buffer->data, "**"); pos; pos = strstr(pos, "**")) > > + if ((pos > buffer->data && pos[-1] != '/') || (pos[2] != '/')) > > + svn_stringbuf_remove(buffer, pos - buffer->data, 1); > > + else > > + ++pos; > Quick quiz: I'm only one dev@ list who do not understand magic peace > of code above? I honestly hope you are. Nice typo, though ;) > I'm fine if majority of Subversion devs do not consider > such code as cryptic unreviewable magic. > [...] > > +static svn_boolean_t > > +match_pattern(const char *s, > > + const char *pattern) > > +{ > > + /* Matching a wildcard pattern is trival: > > + * PATTERN can be considered a list of literal strings separated by > '*'. > > + * We simply have to find all sub-strings in that order, i.e. we can > do > > + * so greedily. Be careful to match prefix and suffix correctly. > > + */ > > + char pattern_char, s_char; > > + > > + /* The prefix part of PATTERN needs special treatment as we can't just > > + * match any substring of S. */ > > + if (pattern[0] != '*') > You'are using inconsitent style to access first char in pattern: > pattern[0] and *pattern. It makes code review harder. Why you're doing > this way? > You are right that both are interchangeable. However, I prefer the index notation when treating something as a string and the pointer notation when it is (moving) pointer. Since both are the same type in C, there is some gray area where I would useither. > > > + { > > + apr_size_t match_len; > > + > > + /* match_to_next_wildcard() assumes a that the first char > matches. */ > > + if (pattern[0] != s[0]) > > + return FALSE; > > + > > + /* Match up to but not beyond the next wildcard. */ > > + match_len = match_to_next_wildcard(s, pattern); > > + if (match_len == 0) > > + return FALSE; > > + > > + /* Continue at next wildcard or end-of-string. */ > > + pattern += match_len; > > + s += match_len; > > + } > > + > > + /* Process all of PATTERN and match it against S char by char. */ > > + while ((pattern_char = *pattern)) > > + { > > + apr_size_t match_len = 0; > > + > > + /* If PATTERN ended on a wildcard, S can be nothing but a match. > */ > > + pattern_char = *++pattern; > You're assigning patern_char in while() condition and in while body. > What is the point for such code? > That's a remnant of an older version of this function. > I'm also find expression like "pattern_char = *++pattern" very hard to > follow. > [...] As I learnt recently, there is an APR function that already covers all the odd pattern matching that we need here. Since r1617148. the above code has been completely removed. > > > +apr_size_t > > +svn_stringbuf_replace_all(svn_stringbuf_t *str, > > + const char *to_find, > > + const char *replacement) > > +{ > Please add tests for this function. Code doesn't seems trivial and > does potential dangerous things like direct manipulation of stringbuf > using memcpy. > Done in r1617144. > > > + apr_size_t replacements = 0; > > + > > + apr_size_t current = 0; > > + apr_size_t original_length = str->len; > > + > > + apr_size_t to_copy; > > + apr_size_t to_find_len; > > + apr_size_t replacement_len; > > + apr_size_t new_length; > > + > > + /* Early exit. */ > > + const char *pos = strstr(str->data, to_find); > > + if (pos == NULL) > > + return 0; > > + > > + to_find_len = strlen(to_find); > > + replacement_len = strlen(replacement); > > + > > + /* We will store the new contents behind the NUL terminator of the > current > > + * data and track the total length in STR->LEN to make the > reallocation > > + * code preserve both bits. However, we need to keep the NUL between > them > > + * to make strstr stop at that boundary. */ > Why you've implemented so complicated algorithm instread of calling > existing svn_stringbuf_replace() for every match? > O(N) vs. O(N^2). There is certainly a lot of further tuning potential in various cases (to_find_len == replacement_len etc.) but the current implementation has decent performance and good worst-case behaviour. You're still reallocating buffer multiple times and moving data > arround, so I doubt that it significantly more performant that simple > implementation based on svn_stringbuf_replace(). > You are mistaken. svn_stringbuf_t has O(N) guarantees for growing to string length N. [...] > I'm also +1 Brane comment about using maintaner mode and do not commit > code with compilation warnings. > I accidentally hadn't activated it for that branch (I often switch between configuration options). Once activated it showed exactly two warnings that were not deprecations. So, on my machine, the code *is* clean. To get any of the size conversion warnings that I fix occasionally, I have to switch to macos. -- Stefan^2.