RyuSA opened a new issue, #31218: URL: https://github.com/apache/beam/issues/31218
### What would you like to happen? I would like to make the import of Filesystem, which is defined in the top-level code of `apache_beam.io.filesystems`, easier to troubleshoot. https://github.com/apache/beam/blob/v2.56.0/sdks/python/apache_beam/io/filesystems.py#L36-L59 AS-IS: ```python try: from apache_beam.io.hadoopfilesystem import HadoopFileSystem except ImportError: pass ``` PROPOSAL: ```python try: from apache_beam.io.hadoopfilesystem import HadoopFileSystem except ModuleNotFoundError: pass except ImportError as e: _LOGGER.warning("Failed to import HadoopFileSystem; loading of this filesystem will be skipped.", e) ``` For context, I encountered a problem when launching a Beam job on CentOS 7 with apache-beam[gcp]==2.55.0 installed. The error occurs at the time of job initiation and is not an issue that occurs during job execution. ```bash $ python3 -m apache_beam.examples.wordcount \ --input INPUT \ --output OUTPUT \ --runner DataflowRunner Traceback (most recent call last): File "/opt/rh/rh-python38/root/usr/lib64/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, ... File "/home/ryusa/venv/lib64/python3.8/site-packages/apache_beam/io/filesystems.py", line 103, in get_filesystem raise ValueError( ValueError: Unable to get filesystem from specified path, please use the correct path or ensure the required dependency is installed, e.g., pip install apache-beam[gcp]. Path specified: ... ``` The error itself occurs on [this line](https://github.com/apache/beam/blob/v2.56.0/sdks/python/apache_beam/io/filesystems.py#L102) and is due to the failure to load `GCSFileSystem` at module initialization. This, in turn, is because `GCSFileSystem` relies on the `requests` package which, from version 2 onwards, requires OpenSSL 1.1.1 due to OS dependencies. CentOS 7 has OpenSSL 1.0.2 installed, so the behavior has changed with Beam version 2.55.0 and later. (This is not essential, so I have not investigated in detail.) ```python $ python3 >>> from apache_beam.io.gcp.gcsfilesystem import GCSFileSystem Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ryusa/venv/lib64/python3.8/site-packages/apache_beam/io/gcp/gcsfilesystem.py", line 36, in <module> ... import urllib3 File "/home/ryusa/venv/lib64/python3.8/site-packages/urllib3/__init__.py", line 42, in <module> raise ImportError( ImportError: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'OpenSSL 1.0.2k-fips 26 Jan 2017'. See: https://github.com/urllib3/urllib3/issues/2168 ``` I was able to resolve this quickly because I happened to know about these circumstances, but considering the future, it seems better to handle `ImportError` not just by suppressing it, but by logging a warning error. I can send a Pull Request. However, since it involves committing to a core area, I've raised an Issue first. ### Issue Priority Priority: 2 (default / most feature requests should be filed as P2) ### Issue Components - [X] Component: Python SDK - [ ] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [ ] Component: IO connector - [ ] Component: Beam YAML - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
