Abacn commented on code in PR #39008:
URL: https://github.com/apache/beam/pull/39008#discussion_r3437121969
##########
sdks/python/apache_beam/io/gcp/gcsio.py:
##########
@@ -77,6 +77,24 @@ def default_gcs_bucket_name(project, region):
region, md5(project.encode('utf8')).hexdigest())
+def _get_project_number(project_id, credentials=None):
+ """Resolves a project ID to its project number using Cloud Resource Manager
API."""
+ from google.cloud import resourcemanager_v3
+ client = resourcemanager_v3.ProjectsClient(credentials=credentials)
+ project_info = client.get_project(name=f"projects/{project_id}")
+ # project_info.name is of the form "projects/PROJECT_NUMBER"
+ return int(project_info.name.split('/')[-1])
+
+
+def _validate_bucket_project(bucket, project_id, credentials=None):
+ """Verifies that the GCS bucket is owned by the executing project."""
+ bucket_project_number = bucket.project_number
+ project_number = _get_project_number(project_id, credentials=credentials)
+ if bucket_project_number != project_number:
+ raise ValueError(
+ f'Bucket gs://{bucket.name} is not owned by project {project_id}.')
Review Comment:
It may be a valid concern. Python SDK is not only used in Beam but also
internal FlumePython, and we should probably handle mock as well, to avoid
future back-and-forth if breaking tests and production use cases
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]