Timothy Luna created ARROW-16437:
------------------------------------

             Summary: Mocking tests with moto not currently feasible.
                 Key: ARROW-16437
                 URL: https://issues.apache.org/jira/browse/ARROW-16437
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
    Affects Versions: 7.0.0
         Environment: Ubuntu environment
Python 3.9
PyArrow 7.0.0
moto 3.1.7
            Reporter: Timothy Luna


## Unable to use moto to mock S3 for testing purposes.

 

I've been using AWSWrangler as a loading utility in a custom application and am 
attempting to remove it as a dependency because PyArrow Dataset is capable of 
providing all the s3 functionality I need.

 

The issue stems from the fact that when PyArrow attempts to determine the 
FileSystem type it appears to be sidestepping moto and is failing with:

 

```sh

============================================================================================================================
 FAILURES 
============================================================================================================================
_____________________________________________________________________________________________________________________
 test__pull_cached_data 
_____________________________________________________________________________________________________________________

    @pytest.mark.usefixtures("s3")
    def test__pull_cached_data():
        """Tests pull cached data, both happy and sad."""
        # Here we're going to make a folder, transfer in some files,
        #   and pull them!
        with tempfile.TemporaryDirectory() as t:

            # This commented code functions.
            # from awswrangler.s3 import read_parquet
            # sillything = 
read_parquet('s3://test-bucket/test_metadata.parquet')
>           a = ds.dataset('s3://test-bucket/test_metadata.parquet')

tests/custom_application/loading/test_load.py:156: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:667: in dataset
    return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:412: in 
_filesystem_dataset
    fs, paths_or_selector = _ensure_single_source(source, filesystem)
/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:373: in 
_ensure_single_source
    filesystem, path = _resolve_filesystem_and_path(path, filesystem)
/usr/local/lib/python3.9/site-packages/pyarrow/fs.py:179: in 
_resolve_filesystem_and_path
    filesystem, path = FileSystem.from_uri(path)
pyarrow/_fs.pyx:350: in pyarrow._fs.FileSystem.from_uri
    ???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ 

>   ???
E   OSError: When resolving region for bucket 'test-bucket': AWS Error [code 
99]: curlCode: 35, SSL connect error

pyarrow/error.pxi:114: OSError
---------------------------------------------------------------------------------------------------------------------
 Captured stderr setup 
----------------------------------------------------------------------------------------------------------------------
INFO:botocore.credentials:Found credentials in environment variables.
-----------------------------------------------------------------------------------------------------------------------
 Captured log setup 
-----------------------------------------------------------------------------------------------------------------------
INFO     botocore.credentials:credentials.py:1114 Found credentials in 
environment variables.
====================================================================================================================
 short test summary info 
=====================================================================================================================
FAILED tests/custom_application/loading/test_load.py::test__pull_cached_data - 
OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: 
curlCode: 35, SSL connect error
=======================================================================================================================
 1 failed in 17.82s 
=======================================================================================================================

```

 

 

Please let me know if you need additional information!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to