NBardelot opened a new issue #15069:
URL: https://github.com/apache/airflow/issues/15069


   Hi, I cannot currently reproduce this as a bug but I'm very confident some 
people will stumble on the issue in the long run (a colleague of mine just did 
but using Airflow 1.10, though I think the issue is still the same in Airflow 
2). So I open this issue in order to at least document the subject.
   
   The filesystem sensor uses a glob behaviour, since this PR : 
https://github.com/apache/airflow/pull/5358
   
   Yet, this sensor can be used without distinction with hooks that refer to a 
remote FS. Glob does not handle that.
   
   On the one hand the Python documentation states that glob() uses a mix of 
os.scandir() and fnmatch.fnmatch() which make the code only adapted to a local 
FS. On the other hand Airflow provides hooks like the SFTPHook which manage a 
remote FS (not available to "os"), and those hooks are eligible to the sensor 
via inheritance.
   
   Thus, trying to use a path with a glob pattern and a hook to a remote FS 
should end in an inconsistent behaviour:
   - either you're lucky and the glob() will not find the equivalent path 
locally and just return that the path does not exist (the sensor will never 
trigger);
   - or in a worse case scenario you might trigger the sensor for a file that 
exists locally but not on the remote FS as expected (a false trigger).
   
   In my opinion this should be fixed by two means: 
   
   1. the compatibility should be made available as a function of the hook 
(hook.hasGlobbing() -> true/false ; false by default) to manage the sensor's 
behaviour
   2. by improving the sensor's behaviour to avoid using globs (which is not 
100% portable), by allowing things like startsWith or endsWith path search 
(implemented by a directory listing + lookup, which would be the portable way 
to do things) 
   
   BR. And thanks for existing code, bug or not :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to