Fokko commented on a change in pull request #3560: [AIRFLOW-2697] Drop 
snakebite in favour of hdfs3
URL: https://github.com/apache/incubator-airflow/pull/3560#discussion_r210182865
 
 

 ##########
 File path: airflow/sensors/hdfs_sensor.py
 ##########
 @@ -17,103 +17,231 @@
 # specific language governing permissions and limitations
 # under the License.
 
-import re
-import sys
-from builtins import str
+import posixpath
 
 from airflow import settings
-from airflow.hooks.hdfs_hook import HDFSHook
+from airflow.hooks.hdfs_hook import HdfsHook
 from airflow.sensors.base_sensor_operator import BaseSensorOperator
 from airflow.utils.decorators import apply_defaults
-from airflow.utils.log.logging_mixin import LoggingMixin
 
 
-class HdfsSensor(BaseSensorOperator):
-    """
-    Waits for a file or folder to land in HDFS
+class HdfsFileSensor(BaseSensorOperator):
+    """Sensor that waits for files matching a specific (glob) pattern to land 
in HDFS.
+
+    :param str file_pattern: Glob pattern to match.
+    :param str conn_id: Connection to use.
+    :param Iterable[FilePathFilter] filters: Optional list of filters that can 
be
+        used to apply further filtering to any file paths matching the glob 
pattern.
+        Any files that fail a filter are dropped from consideration.
+    :param int min_size: Minimum size (in MB) for files to be considered. Can 
be used
+        to filter any intermediate files that are below the expected file size.
+    :param Set[str] ignore_exts: File extensions to ignore. By default, files 
with
+        a '_COPYING_' extension are ignored, as these represent temporary 
files.
 
 Review comment:
   Good point @XD-DENG 
   
   We could also trim the prepended `.` from the extension to make both 
situations work.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to