RyuSA opened a new issue, #31218:
URL: https://github.com/apache/beam/issues/31218

   ### What would you like to happen?
   
   I would like to make the import of Filesystem, which is defined in the 
top-level code of `apache_beam.io.filesystems`, easier to troubleshoot.
   
   
https://github.com/apache/beam/blob/v2.56.0/sdks/python/apache_beam/io/filesystems.py#L36-L59
   
   AS-IS:
   
   ```python
   try:
     from apache_beam.io.hadoopfilesystem import HadoopFileSystem
   except ImportError:
     pass
   ```
   
   PROPOSAL:
   
   ```python
   try:
     from apache_beam.io.hadoopfilesystem import HadoopFileSystem
   except ModuleNotFoundError:
     pass
   except ImportError as e:
     _LOGGER.warning("Failed to import HadoopFileSystem; loading of this 
filesystem will be skipped.", e)
   ```
   
   For context, I encountered a problem when launching a Beam job on CentOS 7 
with apache-beam[gcp]==2.55.0 installed. The error occurs at the time of job 
initiation and is not an issue that occurs during job execution.
   
   ```bash
   $ python3 -m apache_beam.examples.wordcount \
        --input INPUT \
        --output OUTPUT \
        --runner DataflowRunner 
   Traceback (most recent call last):
     File "/opt/rh/rh-python38/root/usr/lib64/python3.8/runpy.py", line 194, in 
_run_module_as_main
       return _run_code(code, main_globals, None,
   ...
     File 
"/home/ryusa/venv/lib64/python3.8/site-packages/apache_beam/io/filesystems.py", 
line 103, in get_filesystem
       raise ValueError(
   ValueError: Unable to get filesystem from specified path, please use the 
correct path or ensure the required dependency is installed, e.g., pip install 
apache-beam[gcp]. Path specified: ...
   ```
   
   The error itself occurs on [this 
line](https://github.com/apache/beam/blob/v2.56.0/sdks/python/apache_beam/io/filesystems.py#L102)
 and is due to the failure to load `GCSFileSystem` at module initialization. 
This, in turn, is because `GCSFileSystem` relies on the `requests` package 
which, from version 2 onwards, requires OpenSSL 1.1.1 due to OS dependencies. 
CentOS 7 has OpenSSL 1.0.2 installed, so the behavior has changed with Beam 
version 2.55.0 and later. (This is not essential, so I have not investigated in 
detail.)
   
   ```python
   $ python3
   >>> from apache_beam.io.gcp.gcsfilesystem import GCSFileSystem
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File 
"/home/ryusa/venv/lib64/python3.8/site-packages/apache_beam/io/gcp/gcsfilesystem.py",
 line 36, in <module>
   ...
       import urllib3
     File "/home/ryusa/venv/lib64/python3.8/site-packages/urllib3/__init__.py", 
line 42, in <module>
       raise ImportError(
   ImportError: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' 
module is compiled with 'OpenSSL 1.0.2k-fips  26 Jan 2017'. See: 
https://github.com/urllib3/urllib3/issues/2168
   ```
   
   I was able to resolve this quickly because I happened to know about these 
circumstances, but considering the future, it seems better to handle 
`ImportError` not just by suppressing it, but by logging a warning error.
   I can send a Pull Request. However, since it involves committing to a core 
area, I've raised an Issue first.
   
   
   ### Issue Priority
   
   Priority: 2 (default / most feature requests should be filed as P2)
   
   ### Issue Components
   
   - [X] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to