On 10/03/2012 08:17 PM, Junio C Hamano wrote: > Nguyen Thai Ngoc Duy <pclo...@gmail.com> writes: > >> There's an interesting case: "**foo". According to our rules, that >> pattern does not contain slashes therefore is basename match. But some >> might find that confusing because "**" can match slashes,... > > By "our rules", if you mean "if a pattern has slash, it is anchored", > that obviously need to be updated with this series, if "**" is meant > to match multiple hierarchies. >> I think the latter makes more sense. When users put "**" they expect >> to match some slashes. But that may call for a refactoring in >> path_matches() in attr.c. Putting strstr(pattern, "**") in that >> matching function may increase overhead unnecessarily. >> >> The third option is just die() and let users decide either "*foo", >> "**/foo" or "/**foo", never "**foo". > > For the double-star at the beginning, you should just turn it into "**/" > if it is not followed by a slash internally, I think. > > What is the semantics of ** in the first place? Is it described to > a reasonable level of detail in the documentation updates? For > example does "**foo" match "afoo", "a/b/foo", "a/bfoo", "a/foo/b", > "a/bfoo/c"? Does "x**y" match "xy", "xay", "xa/by", "x/a/y"? > > I am guessing that the only sensible definition is that "**" > requires anything that comes before it (if exists) is at a proper > hierarchy boundary, and anything matches it is also at a proper > hierarchy boundary, so "x**y" matches "x/a/y" and not "xy", "xay", > nor "xa/by" in the above example. If "x**y" can match "xy" or "xay" > (or "**foo" can match "afoo"), it would be unreasonable to say it > implies the pattern is anchored at any level, no?
Given that there is no obvious interpretation for what a construct like "x**y" would mean, and many plausible guesses (most of which sound rather useless), I suggest that we forbid it. This will make the feature easier to explain and make .gitignore files that use it easier to understand. I think that 98% of the usefulness of "**" would be in constructs where it replaces a proper part of the pathname, like "**/SOMETHING" or "SOMETHING/**/SOMETHING"; in other words, where its use matches the regexp "(^|/)\*\*/". In these constructs the only ambiguity is whether "**/" matches regexp "([^/]+/)+" or "([^/]+/)*" (e.g., whether "foo/**/bar" matches "foo/bar"). I personally prefer the second, because the first behavior can be had using the second interpretation by using "SOMETHING/*/**/SOMETHING", whereas the second behavior cannot be implemented in terms of the first in a single line of the .gitignore file. Optionally, one might also like to support "SOMETHING/**" or "**" alone in the obvious ways. As for the implementation, it is quite easy to textually convert a glob pattern, including "**" parts, into a regexp. I happen to have written some Python code that does this for another project (see below). An obvious optimization would be to read any literal parts of the path off the beginning of the glob pattern and only use regexps for the tail part. Would a regexp-based implementation be too slow? Michael _filename_char_pattern = r'[^/]' _glob_patterns = [ ('?', _filename_char_pattern), ('/**', r'(/.+)?'), ('**/', r'(.+/)?'), ('*', _filename_char_pattern + r'*'), ] def glob_to_regexp(pattern): pattern = os.path.normpath(pattern) # remove trivial redundancies if pattern == '**': # This case has to be handled separately because it doesn't # involve a '/' character adjacent to the '**' pattern. (Such # slashes otherwise have to be considered part of the pattern # to handle the matching of zero path components.) return re.compile( r'^' + _filename_char_pattern + r'(.+' + _filename_char_pattern + r')?$' ) regexp = [r'^'] i = 0 while i < len(pattern): for (s, r) in _glob_patterns: if pattern.startswith(s, i): regexp.append(r) i += len(s) break else: # AFAIK it's a normal character. Escape it and add it to # pattern. regexp.append(re.escape(pattern[i])) i += 1 regexp.append(r'$') return re.compile(''.join(regexp)) -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html