[ https://issues.apache.org/jira/browse/ARROW-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Timothy Luna updated ARROW-16437: --------------------------------- Description: h2. Unable to use moto to mock S3 for testing purposes. I've been using AWSWrangler as a loading utility in a custom application and am attempting to remove it as a dependency because PyArrow Dataset is capable of providing all the s3 functionality I need. The issue stems from the fact that when PyArrow attempts to determine the FileSystem type it appears to be sidestepping moto and is failing with: {code:java} root@996521aaba72:/workspaces/custom_application# python -m pytest tests/custom_utilities/loading/test_load.py::test__pull_cached_data ====================================================================================================================== test session starts ======================================================================================================================= platform linux -- Python 3.9.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /workspaces/custom_application, configfile: tox.ini plugins: hypothesis-6.45.0, cov-3.0.0 collected 1 item tests/custom_utilities/loading/test_load.py F [100%]============================================================================================================================ FAILURES ============================================================================================================================ _____________________________________________________________________________________________________________________ test__pull_cached_data _____________________________________________________________________________________________________________________ @pytest.mark.usefixtures("s3") def test__pull_cached_data(): """Tests pull cached data, both happy and sad.""" # Here we're going to make a folder, transfer in some files, # and pull them! with tempfile.TemporaryDirectory() as t: # This works fine # from awswrangler.s3 import read_parquet # sillything = read_parquet('s3://test-bucket/test_metadata.parquet') > a = > ds.dataset('s3://test-bucket/test_metadata.parquet')tests/custom_utilities/loading/test_load.py:156: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:667: in dataset return _filesystem_dataset(source, **kwargs) /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:412: in _filesystem_dataset fs, paths_or_selector = _ensure_single_source(source, filesystem) /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:373: in _ensure_single_source filesystem, path = _resolve_filesystem_and_path(path, filesystem) /usr/local/lib/python3.9/site-packages/pyarrow/fs.py:179: in _resolve_filesystem_and_path filesystem, path = FileSystem.from_uri(path) pyarrow/_fs.pyx:350: in pyarrow._fs.FileSystem.from_uri ??? pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: curlCode: 35, SSL connect errorpyarrow/error.pxi:114: OSError --------------------------------------------------------------------------------------------------------------------- Captured stderr setup ---------------------------------------------------------------------------------------------------------------------- INFO:botocore.credentials:Found credentials in environment variables. ----------------------------------------------------------------------------------------------------------------------- Captured log setup ----------------------------------------------------------------------------------------------------------------------- INFO botocore.credentials:credentials.py:1114 Found credentials in environment variables. ==================================================================================================================== short test summary info ===================================================================================================================== FAILED tests/custom_utilities/loading/test_load.py::test__pull_cached_data - OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: curlCode: 35, SSL connect error ======================================================================================================================= 1 failed in 17.69s ======================================================================================================================= root@996521aaba72:/workspaces/custom_application# {code} If someone is willing to tell me what I should be doing to appropriately format the shell output above, please let me know, but I think the takeaway is `OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: curlCode: 35, SSL connect error`. Please let me know if you need additional information! was: h2. Unable to use moto to mock S3 for testing purposes. I've been using AWSWrangler as a loading utility in a custom application and am attempting to remove it as a dependency because PyArrow Dataset is capable of providing all the s3 functionality I need. The issue stems from the fact that when PyArrow attempts to determine the FileSystem type it appears to be sidestepping moto and is failing with: {code:java} root@996521aaba72:/workspaces/custom_application# python -m pytest tests/custom_utilities/loading/test_load.py::test__pull_cached_data====================================================================================================================== test session starts =======================================================================================================================platform linux -- Python 3.9.10, pytest-7.1.2, pluggy-1.0.0rootdir: /workspaces/custom_application, configfile: tox.iniplugins: hypothesis-6.45.0, cov-3.0.0collected 1 item tests/custom_utilities/loading/test_load.py F [100%] ============================================================================================================================ FAILURES ============================================================================================================================_____________________________________________________________________________________________________________________ test__pull_cached_data _____________________________________________________________________________________________________________________ @pytest.mark.usefixtures("s3") def test__pull_cached_data(): """Tests pull cached data, both happy and sad.""" # Here we're going to make a folder, transfer in some files, # and pull them! with tempfile.TemporaryDirectory() as t: # This works fine # from awswrangler.s3 import read_parquet # sillything = read_parquet('s3://test-bucket/test_metadata.parquet')> a = ds.dataset('s3://test-bucket/test_metadata.parquet') tests/custom_utilities/loading/test_load.py:156: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:667: in dataset return _filesystem_dataset(source, **kwargs)/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:412: in _filesystem_dataset fs, paths_or_selector = _ensure_single_source(source, filesystem)/usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:373: in _ensure_single_source filesystem, path = _resolve_filesystem_and_path(path, filesystem)/usr/local/lib/python3.9/site-packages/pyarrow/fs.py:179: in _resolve_filesystem_and_path filesystem, path = FileSystem.from_uri(path)pyarrow/_fs.pyx:350: in pyarrow._fs.FileSystem.from_uri ???pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status ???_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ???E OSError: When resolving region for bucket 'test-bucket': AWS Error > [code 99]: curlCode: 35, SSL connect error pyarrow/error.pxi:114: OSError--------------------------------------------------------------------------------------------------------------------- Captured stderr setup ----------------------------------------------------------------------------------------------------------------------INFO:botocore.credentials:Found credentials in environment variables.----------------------------------------------------------------------------------------------------------------------- Captured log setup -----------------------------------------------------------------------------------------------------------------------INFO botocore.credentials:credentials.py:1114 Found credentials in environment variables.==================================================================================================================== short test summary info =====================================================================================================================FAILED tests/custom_utilities/loading/test_load.py::test__pull_cached_data - OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: curlCode: 35, SSL connect error======================================================================================================================= 1 failed in 17.69s =======================================================================================================================root@996521aaba72:/workspaces/custom_application# {code} If someone is willing to tell me what I should be doing to appropriately format the shell output above, please let me know, but I think the takeaway is `OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: curlCode: 35, SSL connect error`. Please let me know if you need additional information! > [Python] Mocking tests with moto not currently feasible. > -------------------------------------------------------- > > Key: ARROW-16437 > URL: https://issues.apache.org/jira/browse/ARROW-16437 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Affects Versions: 7.0.0 > Environment: Ubuntu environment > Python 3.9 > PyArrow 7.0.0 > moto 3.1.7 > Reporter: Timothy Luna > Priority: Major > > h2. Unable to use moto to mock S3 for testing purposes. > > I've been using AWSWrangler as a loading utility in a custom application and > am attempting to remove it as a dependency because PyArrow Dataset is capable > of providing all the s3 functionality I need. > > The issue stems from the fact that when PyArrow attempts to determine the > FileSystem type it appears to be sidestepping moto and is failing with: > > {code:java} > root@996521aaba72:/workspaces/custom_application# python -m pytest > tests/custom_utilities/loading/test_load.py::test__pull_cached_data > ====================================================================================================================== > test session starts > ======================================================================================================================= > platform linux -- Python 3.9.10, pytest-7.1.2, pluggy-1.0.0 > rootdir: /workspaces/custom_application, configfile: tox.ini > plugins: hypothesis-6.45.0, cov-3.0.0 > collected 1 item > > > tests/custom_utilities/loading/test_load.py F > > > > [100%]============================================================================================================================ > FAILURES > ============================================================================================================================ > _____________________________________________________________________________________________________________________ > test__pull_cached_data > _____________________________________________________________________________________________________________________ > @pytest.mark.usefixtures("s3") > def test__pull_cached_data(): > """Tests pull cached data, both happy and sad.""" > # Here we're going to make a folder, transfer in some files, > # and pull them! > with tempfile.TemporaryDirectory() as t: > # This works fine > # from awswrangler.s3 import read_parquet > # sillything = > read_parquet('s3://test-bucket/test_metadata.parquet') > > a = > > ds.dataset('s3://test-bucket/test_metadata.parquet')tests/custom_utilities/loading/test_load.py:156: > > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ > /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:667: in dataset > return _filesystem_dataset(source, **kwargs) > /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:412: in > _filesystem_dataset > fs, paths_or_selector = _ensure_single_source(source, filesystem) > /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:373: in > _ensure_single_source > filesystem, path = _resolve_filesystem_and_path(path, filesystem) > /usr/local/lib/python3.9/site-packages/pyarrow/fs.py:179: in > _resolve_filesystem_and_path > filesystem, path = FileSystem.from_uri(path) > pyarrow/_fs.pyx:350: in pyarrow._fs.FileSystem.from_uri > ??? > pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status > ??? > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ > ??? > E OSError: When resolving region for bucket 'test-bucket': AWS Error [code > 99]: curlCode: 35, SSL connect errorpyarrow/error.pxi:114: OSError > --------------------------------------------------------------------------------------------------------------------- > Captured stderr setup > ---------------------------------------------------------------------------------------------------------------------- > INFO:botocore.credentials:Found credentials in environment variables. > ----------------------------------------------------------------------------------------------------------------------- > Captured log setup > ----------------------------------------------------------------------------------------------------------------------- > INFO botocore.credentials:credentials.py:1114 Found credentials in > environment variables. > ==================================================================================================================== > short test summary info > ===================================================================================================================== > FAILED tests/custom_utilities/loading/test_load.py::test__pull_cached_data - > OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: > curlCode: 35, SSL connect error > ======================================================================================================================= > 1 failed in 17.69s > ======================================================================================================================= > root@996521aaba72:/workspaces/custom_application# {code} > > If someone is willing to tell me what I should be doing to appropriately > format the shell output above, please let me know, but I think the takeaway > is `OSError: When resolving region for bucket 'test-bucket': AWS Error [code > 99]: curlCode: 35, SSL connect error`. > Please let me know if you need additional information! -- This message was sent by Atlassian Jira (v8.20.7#820007)