gemini-code-assist[bot] commented on code in PR #39130:
URL: https://github.com/apache/beam/pull/39130#discussion_r3485803726
##########
sdks/python/apache_beam/io/gcp/gcsio.py:
##########
@@ -143,7 +144,20 @@ def get_or_create_default_gcs_bucket(options):
'Creating default GCS bucket for project %s: gs://%s',
project,
bucket_name)
- return gcs.create_bucket(bucket_name, project, location=region)
+ try:
+ return gcs.create_bucket(bucket_name, project, location=region)
+ except Conflict:
+ try:
+ bucket = gcs.get_bucket(bucket_name)
+ except Exception:
+ raise
+ if bucket:
+ _validate_bucket_project(
+ bucket,
+ project,
+ credentials=getattr(gcs.client, '_credentials', None))
+ return bucket
+ raise
Review Comment:

The inner `try-except` block around `gcs.get_bucket` is redundant. Catching
`Exception` and immediately re-raising it does not change the behavior but adds
unnecessary nesting and complexity. Simplifying this block improves readability
and maintainability.
```python
try:
return gcs.create_bucket(bucket_name, project, location=region)
except Conflict:
bucket = gcs.get_bucket(bucket_name)
if bucket:
_validate_bucket_project(
bucket,
project,
credentials=getattr(gcs.client, '_credentials', None))
return bucket
raise
```
##########
sdks/python/apache_beam/io/gcp/gcsio_integration_test.py:
##########
@@ -207,19 +208,17 @@ def test_create_default_bucket(self,
mock_default_gcs_bucket_name):
# requires this option unset.
google_cloud_options.dataflow_kms_key = None
- import random
- from hashlib import blake2b
-
- # Add a random number to avoid collision if multiple test instances
- # are run at the same time. To avoid too many dangling buckets if bucket
- # removal fails, we limit the max number of possible bucket names in this
- # test to 1000.
- overridden_bucket_name = 'gcsio-it-%d-%s-%s-%d' % (
- random.randint(0, 999),
+ # Add a unique uuid and the parameterized test options to the bucket name
+ # to avoid collisions when multiple parameterized instances run in parallel
+ # or concurrent CI jobs run at the same time.
+ overridden_bucket_name = 'gcsio-it-%s-%s-%s-%d-%s-%s' % (
+ uuid.uuid4().hex[:8],
google_cloud_options.region,
blake2b(google_cloud_options.project.encode('utf8'),
digest_size=4).hexdigest(),
- int(time.time()))
+ int(time.time()),
+ str(self.no_gcsio_throttling_counter).lower(),
+ str(self.enable_gcsio_blob_generation).lower())
Review Comment:

GCS bucket names have a strict limit of 63 characters. If the configured
Google Cloud region name is long (e.g., `northamerica-northeast1` which is 23
characters), the constructed bucket name will exceed 63 characters and cause
bucket creation to fail.
Using shorter representations for the UUID, project hash, and boolean flags
ensures the bucket name remains safely under the 63-character limit across all
regions.
```suggestion
overridden_bucket_name = 'gcsio-it-%s-%s-%s-%d-%s-%s' % (
uuid.uuid4().hex[:6],
google_cloud_options.region,
blake2b(google_cloud_options.project.encode('utf8'),
digest_size=2).hexdigest(),
int(time.time()),
'1' if self.no_gcsio_throttling_counter else '0',
'1' if self.enable_gcsio_blob_generation else '0')
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]