[ 
https://issues.apache.org/jira/browse/ARROW-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Luna updated ARROW-16437:
---------------------------------
    Description: 
# 
 ## Unable to use moto to mock S3 for testing purposes.

 

I've been using AWSWrangler as a loading utility in a custom application and am 
attempting to remove it as a dependency because PyArrow Dataset is capable of 
providing all the s3 functionality I need.

 

The issue stems from the fact that when PyArrow attempts to determine the 
FileSystem type it appears to be sidestepping moto and is failing with:

 
{code:java}
============================================================================================================================
 FAILURES 
============================================================================================================================
 
_____________________________________________________________________________________________________________________
 test__pull_cached_data 
_____________________________________________________________________________________________________________________
     @pytest.mark.usefixtures("s3")     def test__pull_cached_data():         
"""Tests pull cached data, both happy and sad."""         # Here we're going to 
make a folder, transfer in some files,         #   and pull them!         with 
tempfile.TemporaryDirectory() as t:             # This commented code 
functions.             # from awswrangler.s3 import read_parquet             # 
sillything = read_parquet('s3://test-bucket/test_metadata.parquet') >           
a = ds.dataset('s3://test-bucket/test_metadata.parquet') 
tests/custom_application/loading/test_load.py:156:  _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  
/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:667: in dataset     
return _filesystem_dataset(source, **kwargs) 
/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:412: in 
_filesystem_dataset     fs, paths_or_selector = _ensure_single_source(source, 
filesystem) /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:373: in 
_ensure_single_source     filesystem, path = _resolve_filesystem_and_path(path, 
filesystem) /usr/local/lib/python3.9/site-packages/pyarrow/fs.py:179: in 
_resolve_filesystem_and_path     filesystem, path = FileSystem.from_uri(path) 
pyarrow/_fs.pyx:350: in pyarrow._fs.FileSystem.from_uri     ??? 
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status     ??? _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _  >   ??? E   OSError: When resolving region for bucket 
'test-bucket': AWS Error [code 99]: curlCode: 35, SSL connect error 
pyarrow/error.pxi:114: OSError 
---------------------------------------------------------------------------------------------------------------------
 Captured stderr setup 
----------------------------------------------------------------------------------------------------------------------
 INFO:botocore.credentials:Found credentials in environment variables. 
-----------------------------------------------------------------------------------------------------------------------
 Captured log setup 
-----------------------------------------------------------------------------------------------------------------------
 INFO     botocore.credentials:credentials.py:1114 Found credentials in 
environment variables. 
====================================================================================================================
 short test summary info 
=====================================================================================================================
 FAILED tests/custom_application/loading/test_load.py::test__pull_cached_data - 
OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: 
curlCode: 35, SSL connect error 
=======================================================================================================================
 1 failed in 17.82s 
======================================================================================================================={code}
 

(Trying to format shell output appropriately, please bear with me.)

Please let me know if you need additional information!

  was:
# 
 ## Unable to use moto to mock S3 for testing purposes.

 

I've been using AWSWrangler as a loading utility in a custom application and am 
attempting to remove it as a dependency because PyArrow Dataset is capable of 
providing all the s3 functionality I need.

 

The issue stems from the fact that when PyArrow attempts to determine the 
FileSystem type it appears to be sidestepping moto and is failing with:

 
{code:java}
============================================================================================================================
 FAILURES 
============================================================================================================================
 
_____________________________________________________________________________________________________________________
 test__pull_cached_data 
_____________________________________________________________________________________________________________________
     @pytest.mark.usefixtures("s3")     def test__pull_cached_data():         
"""Tests pull cached data, both happy and sad."""         # Here we're going to 
make a folder, transfer in some files,         #   and pull them!         with 
tempfile.TemporaryDirectory() as t:             # This commented code 
functions.             # from awswrangler.s3 import read_parquet             # 
sillything = read_parquet('s3://test-bucket/test_metadata.parquet') >           
a = ds.dataset('s3://test-bucket/test_metadata.parquet') 
tests/custom_application/loading/test_load.py:156:  _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  
/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:667: in dataset     
return _filesystem_dataset(source, **kwargs) 
/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:412: in 
_filesystem_dataset     fs, paths_or_selector = _ensure_single_source(source, 
filesystem) /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:373: in 
_ensure_single_source     filesystem, path = _resolve_filesystem_and_path(path, 
filesystem) /usr/local/lib/python3.9/site-packages/pyarrow/fs.py:179: in 
_resolve_filesystem_and_path     filesystem, path = FileSystem.from_uri(path) 
pyarrow/_fs.pyx:350: in pyarrow._fs.FileSystem.from_uri     ??? 
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status     ??? _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _  >   ??? E   OSError: When resolving region for bucket 
'test-bucket': AWS Error [code 99]: curlCode: 35, SSL connect error 
pyarrow/error.pxi:114: OSError 
---------------------------------------------------------------------------------------------------------------------
 Captured stderr setup 
----------------------------------------------------------------------------------------------------------------------
 INFO:botocore.credentials:Found credentials in environment variables. 
-----------------------------------------------------------------------------------------------------------------------
 Captured log setup 
-----------------------------------------------------------------------------------------------------------------------
 INFO     botocore.credentials:credentials.py:1114 Found credentials in 
environment variables. 
====================================================================================================================
 short test summary info 
=====================================================================================================================
 FAILED tests/custom_application/loading/test_load.py::test__pull_cached_data - 
OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: 
curlCode: 35, SSL connect error 
=======================================================================================================================
 1 failed in 17.82s 
======================================================================================================================={code}
 

Please let me know if you need additional information!


> [Python] Mocking tests with moto not currently feasible.
> --------------------------------------------------------
>
>                 Key: ARROW-16437
>                 URL: https://issues.apache.org/jira/browse/ARROW-16437
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 7.0.0
>         Environment: Ubuntu environment
> Python 3.9
> PyArrow 7.0.0
> moto 3.1.7
>            Reporter: Timothy Luna
>            Priority: Major
>
> # 
>  ## Unable to use moto to mock S3 for testing purposes.
>  
> I've been using AWSWrangler as a loading utility in a custom application and 
> am attempting to remove it as a dependency because PyArrow Dataset is capable 
> of providing all the s3 functionality I need.
>  
> The issue stems from the fact that when PyArrow attempts to determine the 
> FileSystem type it appears to be sidestepping moto and is failing with:
>  
> {code:java}
> ============================================================================================================================
>  FAILURES 
> ============================================================================================================================
>  
> _____________________________________________________________________________________________________________________
>  test__pull_cached_data 
> _____________________________________________________________________________________________________________________
>      @pytest.mark.usefixtures("s3")     def test__pull_cached_data():         
> """Tests pull cached data, both happy and sad."""         # Here we're going 
> to make a folder, transfer in some files,         #   and pull them!         
> with tempfile.TemporaryDirectory() as t:             # This commented code 
> functions.             # from awswrangler.s3 import read_parquet             
> # sillything = read_parquet('s3://test-bucket/test_metadata.parquet') >       
>     a = ds.dataset('s3://test-bucket/test_metadata.parquet') 
> tests/custom_application/loading/test_load.py:156:  _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  
> /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:667: in dataset     
> return _filesystem_dataset(source, **kwargs) 
> /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:412: in 
> _filesystem_dataset     fs, paths_or_selector = _ensure_single_source(source, 
> filesystem) /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:373: in 
> _ensure_single_source     filesystem, path = 
> _resolve_filesystem_and_path(path, filesystem) 
> /usr/local/lib/python3.9/site-packages/pyarrow/fs.py:179: in 
> _resolve_filesystem_and_path     filesystem, path = FileSystem.from_uri(path) 
> pyarrow/_fs.pyx:350: in pyarrow._fs.FileSystem.from_uri     ??? 
> pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status     ??? _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _  >   ??? E   OSError: When resolving region for bucket 
> 'test-bucket': AWS Error [code 99]: curlCode: 35, SSL connect error 
> pyarrow/error.pxi:114: OSError 
> ---------------------------------------------------------------------------------------------------------------------
>  Captured stderr setup 
> ----------------------------------------------------------------------------------------------------------------------
>  INFO:botocore.credentials:Found credentials in environment variables. 
> -----------------------------------------------------------------------------------------------------------------------
>  Captured log setup 
> -----------------------------------------------------------------------------------------------------------------------
>  INFO     botocore.credentials:credentials.py:1114 Found credentials in 
> environment variables. 
> ====================================================================================================================
>  short test summary info 
> =====================================================================================================================
>  FAILED tests/custom_application/loading/test_load.py::test__pull_cached_data 
> - OSError: When resolving region for bucket 'test-bucket': AWS Error [code 
> 99]: curlCode: 35, SSL connect error 
> =======================================================================================================================
>  1 failed in 17.82s 
> ======================================================================================================================={code}
>  
> (Trying to format shell output appropriately, please bear with me.)
> Please let me know if you need additional information!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to