uranusjr commented on code in PR #28256:
URL: https://github.com/apache/airflow/pull/28256#discussion_r1058051361


##########
airflow/dag_processing/manager.py:
##########
@@ -777,8 +777,9 @@ def clear_nonexistent_import_errors(self, session):
         :param session: session for ORM operations
         """
         query = session.query(errors.ImportError)
-        if self._file_paths:
-            query = 
query.filter(~errors.ImportError.filename.in_(self._file_paths))
+        files = list_py_file_paths(self._dag_directory, 
include_examples=False, include_zip_paths=True)

Review Comment:
   I think there are two possible alternatives. One is to introduce a new 
attribute on DagFileProcessorManager that stores the “full” paths, so we can 
use it instead of `_file_paths` here. The other is to introduce a new column on 
ImportError that store the filesystem path (i.e. path to the zip file) so we 
can filter it against `_file_paths`.
   
   The root issue here is that both `_file_paths` and `ImportError.filename` 
essentially has double meaning—they both represent the actual filesystem entry 
(path to an actual file), and a Python code loading target (path for the 
interpreter). Right now `_file_paths` is a list of filesystem entries, while 
`ImportError.filename` is a code target, and trying to comparing them is 
fundamentally not going to work.



##########
airflow/dag_processing/manager.py:
##########
@@ -777,8 +777,9 @@ def clear_nonexistent_import_errors(self, session):
         :param session: session for ORM operations
         """
         query = session.query(errors.ImportError)
-        if self._file_paths:
-            query = 
query.filter(~errors.ImportError.filename.in_(self._file_paths))
+        files = list_py_file_paths(self._dag_directory, 
include_examples=False, include_zip_paths=True)

Review Comment:
   I think there are two possible alternatives. One is to introduce a new 
attribute on DagFileProcessorManager that stores the “full” paths, so we can 
use it instead of `_file_paths` here. The other is to introduce a new column on 
ImportError that store the filesystem path (i.e. path to the zip file) so we 
can filter it against `_file_paths`.
   
   The root issue here is that both `_file_paths` and `ImportError.filename` 
essentially has double meaning—they both represent the actual filesystem entry 
(path to an actual file), and a Python code loading target (path for the 
interpreter). Right now `_file_paths` is a list of filesystem entries, while 
`ImportError.filename` is a code target, and trying to comparing them is 
fundamentally not a good idea.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to