KyzEver opened a new issue, #47805:
URL: https://github.com/apache/arrow/issues/47805

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   It would appear that the `S3FileSystem` does not respect the proxy 
variables, and requires usage of the `proxy_options` argument. If this is 
expected behavior, I would argue that it should default to the environment 
variables if present.
   
   ```python
   >>> from pyarrow import fs
   >>> system = fs.S3FileSystem()
   >>> system.get_file_info(file)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "pyarrow/_fs.pyx", line 615, in pyarrow._fs.FileSystem.get_file_info
     File "pyarrow/error.pxi", line 155, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
   OSError: When getting information for key *** in bucket ***: AWS Error 
NETWORK_CONNECTION during HeadObject operation: curlCode: 35, SSL connect error
   
   >>> import os
   >>> proxy = os.environ['HTTP_PROXY']
   >>> system = fs.S3FileSystem(proxy_options=proxy)
   >>> system.get_file_info(file)
   <FileInfo for *** type=FileType.File, size=5011554>
   ```
   
   My work restricts access to external resources, and we leverage the usage of 
`HTTP_PROXY` and `HTTPS_PROXY` to work around this restriction when necessary. 
PyArrow is the first package I have come across that does not respect these 
proxy variables by default, so I assumed it was convention that they would be 
used if present.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to