[ 
https://issues.apache.org/jira/browse/ARROW-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Luna updated ARROW-16437:
---------------------------------
    Description: 
h2. Unable to use moto to mock S3 for testing purposes.

 

I've been using AWSWrangler as a loading utility in a custom application and am 
attempting to remove it as a dependency because PyArrow Dataset is capable of 
providing all the s3 functionality I need.

 

The issue stems from the fact that when PyArrow attempts to determine the 
FileSystem type it appears to be sidestepping moto and is failing with:

 
{code:java}
root@996521aaba72:/workspaces/custom_application# python -m pytest 
tests/custom_utilities/loading/test_load.py::test__pull_cached_data
======================================================================================================================
 test session starts 
=======================================================================================================================
platform linux -- Python 3.9.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /workspaces/custom_application, configfile: tox.ini
plugins: hypothesis-6.45.0, cov-3.0.0
collected 1 item                                                                
                                                                                
                                                                                
                 tests/custom_utilities/loading/test_load.py F                  
                                                                                
                                                                                
                              
[100%]============================================================================================================================
 FAILURES 
============================================================================================================================
_____________________________________________________________________________________________________________________
 test__pull_cached_data 
_____________________________________________________________________________________________________________________
    @pytest.mark.usefixtures("s3")
    def test__pull_cached_data():
        """Tests pull cached data, both happy and sad."""
        # Here we're going to make a folder, transfer in some files,
        #   and pull them!
        with tempfile.TemporaryDirectory() as t:
            # This works fine
            # from awswrangler.s3 import read_parquet
            # sillything = 
read_parquet('s3://test-bucket/test_metadata.parquet')
>           a = 
> ds.dataset('s3://test-bucket/test_metadata.parquet')tests/custom_utilities/loading/test_load.py:156:
>  
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:667: in dataset
    return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:412: in 
_filesystem_dataset
    fs, paths_or_selector = _ensure_single_source(source, filesystem)
/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:373: in 
_ensure_single_source
    filesystem, path = _resolve_filesystem_and_path(path, filesystem)
/usr/local/lib/python3.9/site-packages/pyarrow/fs.py:179: in 
_resolve_filesystem_and_path
    filesystem, path = FileSystem.from_uri(path)
pyarrow/_fs.pyx:350: in pyarrow._fs.FileSystem.from_uri
    ???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ >   ???
E   OSError: When resolving region for bucket 'test-bucket': AWS Error [code 
99]: curlCode: 35, SSL connect errorpyarrow/error.pxi:114: OSError
---------------------------------------------------------------------------------------------------------------------
 Captured stderr setup 
----------------------------------------------------------------------------------------------------------------------
INFO:botocore.credentials:Found credentials in environment variables.
-----------------------------------------------------------------------------------------------------------------------
 Captured log setup 
-----------------------------------------------------------------------------------------------------------------------
INFO     botocore.credentials:credentials.py:1114 Found credentials in 
environment variables.
====================================================================================================================
 short test summary info 
=====================================================================================================================
FAILED tests/custom_utilities/loading/test_load.py::test__pull_cached_data - 
OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: 
curlCode: 35, SSL connect error
=======================================================================================================================
 1 failed in 17.69s 
=======================================================================================================================
root@996521aaba72:/workspaces/custom_application# {code}
 

If someone is willing to tell me what I should be doing to appropriately format 
the shell output above, please let me know, but I think the takeaway is 
`OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: 
curlCode: 35, SSL connect error`.

Please let me know if you need additional information!

  was:
h2. Unable to use moto to mock S3 for testing purposes.

 

I've been using AWSWrangler as a loading utility in a custom application and am 
attempting to remove it as a dependency because PyArrow Dataset is capable of 
providing all the s3 functionality I need.

 

The issue stems from the fact that when PyArrow attempts to determine the 
FileSystem type it appears to be sidestepping moto and is failing with:

 
{code:java}
root@996521aaba72:/workspaces/custom_application# python -m pytest 
tests/custom_utilities/loading/test_load.py::test__pull_cached_data======================================================================================================================
 test session starts 
=======================================================================================================================platform
 linux -- Python 3.9.10, pytest-7.1.2, pluggy-1.0.0rootdir: 
/workspaces/custom_application, configfile: tox.iniplugins: hypothesis-6.45.0, 
cov-3.0.0collected 1 item                                                       
                                                                                
                                                                                
                          
tests/custom_utilities/loading/test_load.py F                                   
                                                                                
                                                                                
             [100%]
============================================================================================================================
 FAILURES 
============================================================================================================================_____________________________________________________________________________________________________________________
 test__pull_cached_data 
_____________________________________________________________________________________________________________________
    @pytest.mark.usefixtures("s3")    def test__pull_cached_data():        
"""Tests pull cached data, both happy and sad."""        # Here we're going to 
make a folder, transfer in some files,        #   and pull them!        with 
tempfile.TemporaryDirectory() as t:            # This works fine            # 
from awswrangler.s3 import read_parquet            # sillything = 
read_parquet('s3://test-bucket/test_metadata.parquet')>           a = 
ds.dataset('s3://test-bucket/test_metadata.parquet')
tests/custom_utilities/loading/test_load.py:156: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:667: in dataset    
return _filesystem_dataset(source, 
**kwargs)/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:412: in 
_filesystem_dataset    fs, paths_or_selector = _ensure_single_source(source, 
filesystem)/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:373: in 
_ensure_single_source    filesystem, path = _resolve_filesystem_and_path(path, 
filesystem)/usr/local/lib/python3.9/site-packages/pyarrow/fs.py:179: in 
_resolve_filesystem_and_path    filesystem, path = 
FileSystem.from_uri(path)pyarrow/_fs.pyx:350: in 
pyarrow._fs.FileSystem.from_uri    ???pyarrow/error.pxi:143: in 
pyarrow.lib.pyarrow_internal_check_status    ???_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
>   ???E   OSError: When resolving region for bucket 'test-bucket': AWS Error 
> [code 99]: curlCode: 35, SSL connect error
pyarrow/error.pxi:114: 
OSError---------------------------------------------------------------------------------------------------------------------
 Captured stderr setup 
----------------------------------------------------------------------------------------------------------------------INFO:botocore.credentials:Found
 credentials in environment 
variables.-----------------------------------------------------------------------------------------------------------------------
 Captured log setup 
-----------------------------------------------------------------------------------------------------------------------INFO
     botocore.credentials:credentials.py:1114 Found credentials in environment 
variables.====================================================================================================================
 short test summary info 
=====================================================================================================================FAILED
 tests/custom_utilities/loading/test_load.py::test__pull_cached_data - OSError: 
When resolving region for bucket 'test-bucket': AWS Error [code 99]: curlCode: 
35, SSL connect 
error=======================================================================================================================
 1 failed in 17.69s 
=======================================================================================================================root@996521aaba72:/workspaces/custom_application#
  {code}
 

If someone is willing to tell me what I should be doing to appropriately format 
the shell output above, please let me know, but I think the takeaway is 
`OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: 
curlCode: 35, SSL connect error`.

Please let me know if you need additional information!


> [Python] Mocking tests with moto not currently feasible.
> --------------------------------------------------------
>
>                 Key: ARROW-16437
>                 URL: https://issues.apache.org/jira/browse/ARROW-16437
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 7.0.0
>         Environment: Ubuntu environment
> Python 3.9
> PyArrow 7.0.0
> moto 3.1.7
>            Reporter: Timothy Luna
>            Priority: Major
>
> h2. Unable to use moto to mock S3 for testing purposes.
>  
> I've been using AWSWrangler as a loading utility in a custom application and 
> am attempting to remove it as a dependency because PyArrow Dataset is capable 
> of providing all the s3 functionality I need.
>  
> The issue stems from the fact that when PyArrow attempts to determine the 
> FileSystem type it appears to be sidestepping moto and is failing with:
>  
> {code:java}
> root@996521aaba72:/workspaces/custom_application# python -m pytest 
> tests/custom_utilities/loading/test_load.py::test__pull_cached_data
> ======================================================================================================================
>  test session starts 
> =======================================================================================================================
> platform linux -- Python 3.9.10, pytest-7.1.2, pluggy-1.0.0
> rootdir: /workspaces/custom_application, configfile: tox.ini
> plugins: hypothesis-6.45.0, cov-3.0.0
> collected 1 item                                                              
>                                                                               
>                                                                               
>                        tests/custom_utilities/loading/test_load.py F          
>                                                                               
>                                                                               
>                                           
> [100%]============================================================================================================================
>  FAILURES 
> ============================================================================================================================
> _____________________________________________________________________________________________________________________
>  test__pull_cached_data 
> _____________________________________________________________________________________________________________________
>     @pytest.mark.usefixtures("s3")
>     def test__pull_cached_data():
>         """Tests pull cached data, both happy and sad."""
>         # Here we're going to make a folder, transfer in some files,
>         #   and pull them!
>         with tempfile.TemporaryDirectory() as t:
>             # This works fine
>             # from awswrangler.s3 import read_parquet
>             # sillything = 
> read_parquet('s3://test-bucket/test_metadata.parquet')
> >           a = 
> > ds.dataset('s3://test-bucket/test_metadata.parquet')tests/custom_utilities/loading/test_load.py:156:
> >  
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ 
> /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:667: in dataset
>     return _filesystem_dataset(source, **kwargs)
> /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:412: in 
> _filesystem_dataset
>     fs, paths_or_selector = _ensure_single_source(source, filesystem)
> /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:373: in 
> _ensure_single_source
>     filesystem, path = _resolve_filesystem_and_path(path, filesystem)
> /usr/local/lib/python3.9/site-packages/pyarrow/fs.py:179: in 
> _resolve_filesystem_and_path
>     filesystem, path = FileSystem.from_uri(path)
> pyarrow/_fs.pyx:350: in pyarrow._fs.FileSystem.from_uri
>     ???
> pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
>     ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ >   ???
> E   OSError: When resolving region for bucket 'test-bucket': AWS Error [code 
> 99]: curlCode: 35, SSL connect errorpyarrow/error.pxi:114: OSError
> ---------------------------------------------------------------------------------------------------------------------
>  Captured stderr setup 
> ----------------------------------------------------------------------------------------------------------------------
> INFO:botocore.credentials:Found credentials in environment variables.
> -----------------------------------------------------------------------------------------------------------------------
>  Captured log setup 
> -----------------------------------------------------------------------------------------------------------------------
> INFO     botocore.credentials:credentials.py:1114 Found credentials in 
> environment variables.
> ====================================================================================================================
>  short test summary info 
> =====================================================================================================================
> FAILED tests/custom_utilities/loading/test_load.py::test__pull_cached_data - 
> OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: 
> curlCode: 35, SSL connect error
> =======================================================================================================================
>  1 failed in 17.69s 
> =======================================================================================================================
> root@996521aaba72:/workspaces/custom_application# {code}
>  
> If someone is willing to tell me what I should be doing to appropriately 
> format the shell output above, please let me know, but I think the takeaway 
> is `OSError: When resolving region for bucket 'test-bucket': AWS Error [code 
> 99]: curlCode: 35, SSL connect error`.
> Please let me know if you need additional information!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to