geraj1010 commented on issue #41472: URL: https://github.com/apache/airflow/issues/41472#issuecomment-2303477666
> > > > Is there an issue with using an `SFTPSensor` for checking file existence, prior to running `SFTPToGCSOperator`? > > > > > > > > > The SFTP sensor will wait until the file is detected. This issue is to determine whether the task should fail if the file is not found. > > > > > > I see. In that case, I don't think it makes sense to have the task succeed if a file is not found. A 'success' would not have much meaning anymore besides that the task simply ran. The interpretation of 'success' would become ambiguous i.e. how can you tell if a file was indeed moved or not without having to further dig into the logs? In addition, it doesn't feel very 'airflow-like' since we would be overloading the operator with functionality that already exists through a sensor. > > There is `soft_fail` for sensors too, if you want the sensor to 'skip' instead of 'fail'. > > I agree with your point, but this is an optional parameter. By default, it will fail the task if the file is not found. If the user sets the `fail_on_sftp_file_not_exist` to False, it would make some difference. > > > I'm not sure what your DAG looks like, but perhaps you can use a `SFTPSensor` and trigger rules for your downstream tasks? > > The final task of my DAG is looking for a file, and the dag_run should not fail, though the file is not present. And I don't want to use the sensor to poke for a file. That's unfortunate you don't want to use a sensor for checking for file existence. If the sensor failed (i.e. file didn't exist after some time), you could have it `soft_fail` and it would skip (instead of fail) and you could choose which downstream tasks get skipped or ran based on trigger rules. At any rate, I think as long as you use the trigger rule `all_done` for one (or more) of the downstream tasks, the dag_run will still be marked as successful. So you could have your `SFTPToGCSOperator` task fail (because the file doesn't exist on SFTP), and have the trigger rule for all other downstream tasks set to `all_done`, and they will still run and the dag_run will still be considered successful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
