A-Costa commented on issue #32650: URL: https://github.com/apache/airflow/issues/32650#issuecomment-1708539407
Hi @shahar1 i've published a very rudimental draft PR #34137 The first issue i'm encountering is the following: while it was easy to adapt the `poke` method to use the `match_glob` parameter, it is not easy to adjust the `execute` method, in particular when the sensor is set with `deferrable=True`. The problem here is that the sensor is using the `GCSBlobTrigger`, which in turn uses the [`Bucket`](https://github.com/talkiq/gcloud-aio/blob/b170cf916f0ad52357bd3457ce5b905dd9170132/storage/gcloud/aio/storage/bucket.py#L30) class from the [gcloud-aio](https://github.com/talkiq/gcloud-aio) library. The `GCSBlobTrigger` is using the method `blob_exists` which only works with the exact filename. https://github.com/apache/airflow/blob/b9acffa81bf61dcf0c5553942c52629c7f75ebe2/airflow/providers/google/cloud/triggers/gcs.py#L101 Another method called [`list_blobs`](https://github.com/talkiq/gcloud-aio/blob/b170cf916f0ad52357bd3457ce5b905dd9170132/storage/gcloud/aio/storage/bucket.py#L64) is available in the class and is in fact used by `GCSPrefixBlobTrigger`. The problem is that `list_blob` only implements the `prefix` parameter and not a glob matching one. I'm now gonna open an issue on the `gcloud-aio` repository and see if they are willing to add this functionality, otherwise i guess the only viable approach would be to implement it ourselves. Basically it's the same issue that you had to solve implementing [`_list_blobs_with_match_glob`](https://github.com/apache/airflow/blob/b9acffa81bf61dcf0c5553942c52629c7f75ebe2/airflow/providers/google/cloud/hooks/gcs.py#L847) but for the async version of it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
