EmadMokhtar opened a new issue #10426:
URL: https://github.com/apache/airflow/issues/10426


   **Description**
   
   Support passing multiple prefixes to `GoogleCloudStorageListOperator` and 
`GoogleCloudStorageDeleteOperator` operators.
   
   **Use case / motivation**
   
   I have this folder structure in GCS bucket.
   
   ```
   +-- year={year}
   |   +-- month={month}
   |       +--day={day}
   |           +-- topic={topic1}
   |                 +--file 1
   |                 +--file 2
   |                 +--file 3
   |           +-- topic={topic2}
   |                 +--file 1
   |                 +--file 2
   |                 +--file 3
   |           +-- topic={topic3}
   |                 +--file 1
   |                 +--file 2
   |                 +--file 3
   |           +-- topic={topic4}
   |                 +--file 1
   |                 +--file 2
   |                 +--file 3
   |           +-- topic={topic5}
   |                 +--file 1
   |                 +--file 2
   |                 +--file 3
   |           +-- topic={topic6}
   |                 +--file 1
   |                 +--file 2
   |                 +--file 3
   |           +-- topic={topic7}
   |       +--day={day}
   |           +-- topic={topic1}
   |                 +--file 1
   |                 +--file 2
   |                 +--file 3
   |           +-- topic={topic2}
   |                 +--file 1
   |                 +--file 2
   |                 +--file 3
   |           +-- topic={topic3}
   |                 +--file 1
   |                 +--file 2
   |                 +--file 3
   |           +-- topic={topic4}
   |           +-- topic={topic5}
   |           +-- topic={topic6}
   |           +-- topic={topic7}
   |           ....
   ```
   
   What I need to achieve is delete one day of objects. For example, I need to 
delete objects in `year=2020/month=08/day=19`. I can do that easily using 
`gsutils`. In `gsutil` you can delete them via wild card `gsutil 
ear=2020/month=08/day=19/*` but using the REST APIs you can't even if you use a 
prefix. The reason is there is no one prefix to get all the objects inside a 
folder. I achieved that by using multiple prefixes and for each prefix, I will 
get the list of objects. Unfortunately, I can't pass more than one prefix to 
the operators.
   
   **Prefixes used**
   - year=2020/month=08/day=19``
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to