saveriogzz commented on a change in pull request #17436:
URL: https://github.com/apache/airflow/pull/17436#discussion_r683254408
##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -212,7 +218,9 @@ def delete_directory(self, path: str) -> None:
conn = self.get_conn()
conn.rmdir(path)
- def retrieve_file(self, remote_full_path: str, local_full_path: str) ->
None:
+ def retrieve_file(
+ self, remote_full_path: str, local_full_path: str, fnmatch_pattern:
Optional[str] = None
+ ) -> None:
Review comment:
@uranusjr, in #15409, already mentioned that this implementation could
cause headaches to users inasmuch "it subtly changes semantics depending on the
arguments". We can start from here and think about a better solution.
##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -250,14 +265,22 @@ def delete_file(self, path: str) -> None:
conn = self.get_conn()
conn.remove(path)
- def get_mod_time(self, path: str) -> str:
+ def get_mod_time(self, path: str, fnmatch_pattern: Optional[str] = None)
-> str:
"""
Returns modification time.
:param path: full path to the remote file
:type path: str
+ :param fnmatch_pattern: optional pattern to filter the
remote_full_path files
+ :type: fnmatch_pattern: Optional[str]
"""
conn = self.get_conn()
+ if fnmatch_pattern:
+ for file in conn.listdir(path):
+ if fnmatch(file, fnmatch_pattern):
+ path = file
+ break
+
Review comment:
@uranusjr , how about this method called by the
[SFTPSensor](https://github.com/apache/airflow/blob/main/airflow/providers/sftp/sensors/sftp.py#L49)?
Should we create a different method to be able to catch all the files
matching the pattern or we just wait for the the first of the list as it is
implemented now?
##########
File path: airflow/providers/sftp/sensors/sftp.py
##########
@@ -57,13 +57,23 @@ def __init__(
def poke(self, context: dict) -> bool:
self.hook = SFTPHook(self.sftp_conn_id)
- self.log.info('Poking for %s', self.path)
- try:
- mod_time = self.hook.get_mod_time(self.path, self.fnmatch_pattern)
- self.log.info('Found File %s last modified: %s', str(self.path),
str(mod_time))
- except OSError as e:
- if e.errno != SFTP_NO_SUCH_FILE:
- raise e
- return False
+ if not self.fnmatch_pattern:
+ self.log.info('Poking for %s', self.path)
+ try:
+ mod_time = self.hook.get_mod_time(self.path)
+ self.log.info('Found File %s last modified: %s',
str(self.path), str(mod_time))
+ except OSError as e:
+ if e.errno != SFTP_NO_SUCH_FILE:
+ raise e
+ return False
+ else:
+ self.log.info('Poking for files matching pattern %s into %s',
self.fnmatch_pattern, self.path)
+ try:
+ mod_time = self.hook.get_mod_time_pattern(self.path,
self.fnmatch_pattern)
+ self.log.info('Found Files last modified %s', str(mod_time))
+ except OSError as e:
+ if e.errno != SFTP_NO_SUCH_FILE:
+ raise e
+ return False
Review comment:
here we have two options:
- not having specified `fnmatch_pattern`
- having specified it
In the first case, the parameter `path` is as usual the full path to the
file.
In the second case, the `path` will be the path to the directory whilst
`fnmatch_pattern` will be useful to catch all the files we are interested in.
In this second case, the new method
[get_mod_time_pattern](https://github.com/saveriogzz/airflow/blob/AIRFLOW-15332/airflow/providers/sftp/hooks/sftp.py#L311),
which returns a dictionaries with full paths to files as keys and their
modification time as values.
##########
File path: airflow/providers/sftp/hooks/sftp.py
##########
@@ -265,25 +284,51 @@ def delete_file(self, path: str) -> None:
conn = self.get_conn()
conn.remove(path)
- def get_mod_time(self, path: str, fnmatch_pattern: Optional[str] = None)
-> str:
+ def delete_file_pattern(self, dir_path: str, fnmatch_pattern: str) -> None:
"""
- Returns modification time.
+ Removes files matching the pattern on the FTP Server
+
+ :param dir_path: full path to the remote directory
+ :type dir_path: str
+ :param fnmatch_pattern: pattern to filter the dir_path files
+ :type fnmatch_pattern: str
+ """
+ conn = self.get_conn()
+ for file in conn.listdir(dir_path):
Review comment:
this is wrong because it should raise errors in case no matches are
found, at the moment it's not
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]