hussein-awala commented on code in PR #56509:
URL: https://github.com/apache/airflow/pull/56509#discussion_r2422636503


##########
airflow-core/src/airflow/utils/file.py:
##########
@@ -102,24 +102,21 @@ def compile(pattern: str, base_dir: Path, 
definition_file: Path) -> _IgnoreRule
             relative_to = definition_file.parent
 
         ignore_pattern = GitWildMatchPattern(pattern)
-        return _GlobIgnoreRule(ignore_pattern, relative_to)
+        return _GlobIgnoreRule(wild_match_pattern=ignore_pattern, 
relative_to=relative_to)
 
     @staticmethod
     def match(path: Path, rules: list[_IgnoreRule]) -> bool:
-        """Match a list of ignore rules against the supplied path."""
+        """Match a list of ignore rules against the supplied path, accounting 
for exclusion rules and ordering."""
         matched = False
-        for r in rules:
-            if not isinstance(r, _GlobIgnoreRule):
-                raise ValueError(f"_GlobIgnoreRule cannot match rules of type: 
{type(r)}")
-            rule: _GlobIgnoreRule = r  # explicit typing to make mypy play 
nicely
+        for rule in rules:
+            if not isinstance(rule, _GlobIgnoreRule):
+                raise ValueError(f"_GlobIgnoreRule cannot match rules of type: 
{type(rule)}")
             rel_path = str(path.relative_to(rule.relative_to) if 
rule.relative_to else path.name)
-            matched = rule.wild_match_pattern.match_file(rel_path) is not None
-
-            if matched:
-                if not rule.wild_match_pattern.include:
-                    return False
-
-                return matched
+            if (
+                rule.wild_match_pattern.include is not None
+                and rule.wild_match_pattern.match_file(rel_path) is not None
+            ):
+                matched = rule.wild_match_pattern.include
 
         return matched

Review Comment:
   TBH, I don't know what was the behaviour of our ignore file, but I know that 
it should have the same behavior as the other ignore files (git, docker, ...).
   
   Let's take [.gitignore](https://git-scm.com/docs/gitignore) as reference:
   > Each line in a gitignore file specifies a pattern. When deciding whether 
to ignore a path, Git normally checks gitignore patterns from multiple sources, 
with the following order of precedence, from highest to lowest (within one 
level of precedence, the last matching pattern decides the outcome)
   
   So since the last rule that match the file decides if it should be included 
or excluded, and since it's useless to process all the rules if there is a 
match in one of last rules in the file, why we don't process them in reverse 
order and return the outcome of the first match:
   
   ```python
       @staticmethod
       def match(path: Path, rules: list[_IgnoreRule]) -> bool:
           """Match a list of ignore rules against the supplied path."""
           for r in reversed(rules):
               if not isinstance(r, _GlobIgnoreRule):
                   raise ValueError(f"_GlobIgnoreRule cannot match rules of 
type: {type(r)}")
   
               rule: _GlobIgnoreRule = r  # explicit typing to make mypy play 
nicely
               rel_path = str(path.relative_to(rule.relative_to) if 
rule.relative_to else path.name)
   
               if rule.wild_match_pattern.match_file(rel_path) is not None:  # 
matched
                   return bool(rule.wild_match_pattern.include)  # included
   
           return False
   ```
   
   For your question about `rule.wild_match_pattern.include`, if I remember 
correctly, some patterns are only a matcher, without supporting 
inclusion/exclusion logic, and `include` is represented as a `None` in that 
case. But these patterns are not `GitWildMatchPattern` so we should be safe.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to