shunping commented on code in PR #33611:
URL: https://github.com/apache/beam/pull/33611#discussion_r1937608203


##########
sdks/python/apache_beam/io/gcp/gcsio.py:
##########
@@ -247,13 +247,35 @@ def open(
   def delete(self, path):
     """Deletes the object at the given GCS path.
 
+    If the path is a directory (prefix), it deletes all blobs under that 
prefix.
+
     Args:
       path: GCS file path pattern in the form gs://<bucket>/<name>.
     """
     bucket_name, blob_name = parse_gcs_path(path)
     bucket = self.client.bucket(bucket_name)
+
+    # Check if the blob is a directory (prefix) by listing objects
+    # under that prefix.
+    blobs = list(bucket.list_blobs(prefix=blob_name))

Review Comment:
   Adding the `recursive` parameter with default as `false` sounds good to me 
as it extends the functionality of `gcsio.delete()` without breaking 
compatibility.
   
   To fix https://github.com/apache/beam/issues/27605, however, you will also 
need to make changes to `gcsfilesystem.py` to leverage the new functionality.
   
   For example, you can do something similar to 
https://github.com/apache/beam/pull/29477/files in `gcsfilesystem.delete()`
   
   ```python
     for path in paths:
         if path.endswith('/'):
           # This is a directory. Remove all of its contents including
           # objects and subdirectories.
           self._gcsIO().delete(path, recursive=True)
           return
         else:
            ...
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to