hussein-awala commented on code in PR #56509:
URL: https://github.com/apache/airflow/pull/56509#discussion_r2422636503
##########
airflow-core/src/airflow/utils/file.py:
##########
@@ -102,24 +102,21 @@ def compile(pattern: str, base_dir: Path,
definition_file: Path) -> _IgnoreRule
relative_to = definition_file.parent
ignore_pattern = GitWildMatchPattern(pattern)
- return _GlobIgnoreRule(ignore_pattern, relative_to)
+ return _GlobIgnoreRule(wild_match_pattern=ignore_pattern,
relative_to=relative_to)
@staticmethod
def match(path: Path, rules: list[_IgnoreRule]) -> bool:
- """Match a list of ignore rules against the supplied path."""
+ """Match a list of ignore rules against the supplied path, accounting
for exclusion rules and ordering."""
matched = False
- for r in rules:
- if not isinstance(r, _GlobIgnoreRule):
- raise ValueError(f"_GlobIgnoreRule cannot match rules of type:
{type(r)}")
- rule: _GlobIgnoreRule = r # explicit typing to make mypy play
nicely
+ for rule in rules:
+ if not isinstance(rule, _GlobIgnoreRule):
+ raise ValueError(f"_GlobIgnoreRule cannot match rules of type:
{type(rule)}")
rel_path = str(path.relative_to(rule.relative_to) if
rule.relative_to else path.name)
- matched = rule.wild_match_pattern.match_file(rel_path) is not None
-
- if matched:
- if not rule.wild_match_pattern.include:
- return False
-
- return matched
+ if (
+ rule.wild_match_pattern.include is not None
+ and rule.wild_match_pattern.match_file(rel_path) is not None
+ ):
+ matched = rule.wild_match_pattern.include
return matched
Review Comment:
TBH, I don't know what was the behaviour of our ignore file, but I know that
it should have the same behavior as the other ignore files (git, docker, ...).
Let's take [.gitignore](https://git-scm.com/docs/gitignore) as reference:
> Each line in a gitignore file specifies a pattern. When deciding whether
to ignore a path, Git normally checks gitignore patterns from multiple sources,
with the following order of precedence, from highest to lowest (within one
level of precedence, the last matching pattern decides the outcome)
So since the last rule that match the file decides if it should be included
or excluded, and since it's useless to process all the rules if there is a
match in one of last rules in the file, why we don't process them in reverse
order and return the outcome of the first match:
```python
@staticmethod
def match(path: Path, rules: list[_IgnoreRule]) -> bool:
"""Match a list of ignore rules against the supplied path."""
for r in reversed(rules):
if not isinstance(r, _GlobIgnoreRule):
raise ValueError(f"_GlobIgnoreRule cannot match rules of
type: {type(r)}")
rule: _GlobIgnoreRule = r # explicit typing to make mypy play
nicely
rel_path = str(path.relative_to(rule.relative_to) if
rule.relative_to else path.name)
if rule.wild_match_pattern.match_file(rel_path) is not None: #
matched
return bool(rule.wild_match_pattern.include) # included
return False
```
For your question about `rule.wild_match_pattern.include`, if I remember
correctly, some patterns are only a matcher, without supporting
inclusion/exclusion logic, and `include` is represented as a `None` in that
case. But these patterns are not `GitWildMatchPattern` so we should be safe.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]