hjtran opened a new issue, #37445:
URL: https://github.com/apache/beam/issues/37445
There's an inconsistency in how `FileSystems.get_filesystem()` handles
missing optional dependencies between GCS and S3.
### Current Behavior
**S3 (without `aws` extra):**
```python
>>> from apache_beam.io import filesystems
>>> filesystems.FileSystems.get_filesystem("s3://blah")
<apache_beam.io.aws.s3filesystem.S3FileSystem at 0x11a0af750>
```
Returns the filesystem object; validation happens later when the filesystem
is actually used.
**GCS (without `gcp` extra):**
```python
>>> from apache_beam.io import filesystems
>>> filesystems.FileSystems.get_filesystem("gcs://blah")
ValueError: Unable to get filesystem from specified path, please use the
correct path or ensure the required dependency is installed, e.g., pip install
apache-beam[gcp]. Path specified: gcs://blah
```
Raises immediately because `GCSFileSystem` isn't registered as a subclass.
### Proposed Behavior
Both should behave consistently. GCSFileSystem should be returned from
`get_filesystem()` like S3FileSystem, allowing callers to validate dependencies
when the filesystem is actually used rather than at lookup time.
### Why This Matters
- Inconsistent API behavior is confusing
- Code that handles multiple filesystem types can't catch/handle GCS
gracefully
- Dependency validation at usage time (not lookup time) allows for better
error handling and lazy loading patterns
### Environment
- Apache Beam version: 2.70.0
- Python version: 3.11
---
*Generated by Claude Code, confirmed by @hjtran*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]