[ https://issues.apache.org/jira/browse/ARROW-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Timothy Luna updated ARROW-16437: --------------------------------- Description: # ## Unable to use moto to mock S3 for testing purposes. I've been using AWSWrangler as a loading utility in a custom application and am attempting to remove it as a dependency because PyArrow Dataset is capable of providing all the s3 functionality I need. The issue stems from the fact that when PyArrow attempts to determine the FileSystem type it appears to be sidestepping moto and is failing with: {code:java} ============================================================================================================================ FAILURES ============================================================================================================================ _____________________________________________________________________________________________________________________ test__pull_cached_data _____________________________________________________________________________________________________________________ @pytest.mark.usefixtures("s3") def test__pull_cached_data(): """Tests pull cached data, both happy and sad.""" # Here we're going to make a folder, transfer in some files, # and pull them! with tempfile.TemporaryDirectory() as t: # This commented code functions. # from awswrangler.s3 import read_parquet # sillything = read_parquet('s3://test-bucket/test_metadata.parquet') > a = ds.dataset('s3://test-bucket/test_metadata.parquet') tests/custom_application/loading/test_load.py:156: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:667: in dataset return _filesystem_dataset(source, **kwargs) /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:412: in _filesystem_dataset fs, paths_or_selector = _ensure_single_source(source, filesystem) /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:373: in _ensure_single_source filesystem, path = _resolve_filesystem_and_path(path, filesystem) /usr/local/lib/python3.9/site-packages/pyarrow/fs.py:179: in _resolve_filesystem_and_path filesystem, path = FileSystem.from_uri(path) pyarrow/_fs.pyx:350: in pyarrow._fs.FileSystem.from_uri ??? pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: curlCode: 35, SSL connect error pyarrow/error.pxi:114: OSError --------------------------------------------------------------------------------------------------------------------- Captured stderr setup ---------------------------------------------------------------------------------------------------------------------- INFO:botocore.credentials:Found credentials in environment variables. ----------------------------------------------------------------------------------------------------------------------- Captured log setup ----------------------------------------------------------------------------------------------------------------------- INFO botocore.credentials:credentials.py:1114 Found credentials in environment variables. ==================================================================================================================== short test summary info ===================================================================================================================== FAILED tests/custom_application/loading/test_load.py::test__pull_cached_data - OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: curlCode: 35, SSL connect error ======================================================================================================================= 1 failed in 17.82s ======================================================================================================================={code} (Trying to format shell output appropriately, please bear with me.) Please let me know if you need additional information! was: # ## Unable to use moto to mock S3 for testing purposes. I've been using AWSWrangler as a loading utility in a custom application and am attempting to remove it as a dependency because PyArrow Dataset is capable of providing all the s3 functionality I need. The issue stems from the fact that when PyArrow attempts to determine the FileSystem type it appears to be sidestepping moto and is failing with: {code:java} ============================================================================================================================ FAILURES ============================================================================================================================ _____________________________________________________________________________________________________________________ test__pull_cached_data _____________________________________________________________________________________________________________________ @pytest.mark.usefixtures("s3") def test__pull_cached_data(): """Tests pull cached data, both happy and sad.""" # Here we're going to make a folder, transfer in some files, # and pull them! with tempfile.TemporaryDirectory() as t: # This commented code functions. # from awswrangler.s3 import read_parquet # sillything = read_parquet('s3://test-bucket/test_metadata.parquet') > a = ds.dataset('s3://test-bucket/test_metadata.parquet') tests/custom_application/loading/test_load.py:156: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:667: in dataset return _filesystem_dataset(source, **kwargs) /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:412: in _filesystem_dataset fs, paths_or_selector = _ensure_single_source(source, filesystem) /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:373: in _ensure_single_source filesystem, path = _resolve_filesystem_and_path(path, filesystem) /usr/local/lib/python3.9/site-packages/pyarrow/fs.py:179: in _resolve_filesystem_and_path filesystem, path = FileSystem.from_uri(path) pyarrow/_fs.pyx:350: in pyarrow._fs.FileSystem.from_uri ??? pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: curlCode: 35, SSL connect error pyarrow/error.pxi:114: OSError --------------------------------------------------------------------------------------------------------------------- Captured stderr setup ---------------------------------------------------------------------------------------------------------------------- INFO:botocore.credentials:Found credentials in environment variables. ----------------------------------------------------------------------------------------------------------------------- Captured log setup ----------------------------------------------------------------------------------------------------------------------- INFO botocore.credentials:credentials.py:1114 Found credentials in environment variables. ==================================================================================================================== short test summary info ===================================================================================================================== FAILED tests/custom_application/loading/test_load.py::test__pull_cached_data - OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]: curlCode: 35, SSL connect error ======================================================================================================================= 1 failed in 17.82s ======================================================================================================================={code} Please let me know if you need additional information! > [Python] Mocking tests with moto not currently feasible. > -------------------------------------------------------- > > Key: ARROW-16437 > URL: https://issues.apache.org/jira/browse/ARROW-16437 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Affects Versions: 7.0.0 > Environment: Ubuntu environment > Python 3.9 > PyArrow 7.0.0 > moto 3.1.7 > Reporter: Timothy Luna > Priority: Major > > # > ## Unable to use moto to mock S3 for testing purposes. > > I've been using AWSWrangler as a loading utility in a custom application and > am attempting to remove it as a dependency because PyArrow Dataset is capable > of providing all the s3 functionality I need. > > The issue stems from the fact that when PyArrow attempts to determine the > FileSystem type it appears to be sidestepping moto and is failing with: > > {code:java} > ============================================================================================================================ > FAILURES > ============================================================================================================================ > > _____________________________________________________________________________________________________________________ > test__pull_cached_data > _____________________________________________________________________________________________________________________ > @pytest.mark.usefixtures("s3") def test__pull_cached_data(): > """Tests pull cached data, both happy and sad.""" # Here we're going > to make a folder, transfer in some files, # and pull them! > with tempfile.TemporaryDirectory() as t: # This commented code > functions. # from awswrangler.s3 import read_parquet > # sillything = read_parquet('s3://test-bucket/test_metadata.parquet') > > a = ds.dataset('s3://test-bucket/test_metadata.parquet') > tests/custom_application/loading/test_load.py:156: _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:667: in dataset > return _filesystem_dataset(source, **kwargs) > /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:412: in > _filesystem_dataset fs, paths_or_selector = _ensure_single_source(source, > filesystem) /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:373: in > _ensure_single_source filesystem, path = > _resolve_filesystem_and_path(path, filesystem) > /usr/local/lib/python3.9/site-packages/pyarrow/fs.py:179: in > _resolve_filesystem_and_path filesystem, path = FileSystem.from_uri(path) > pyarrow/_fs.pyx:350: in pyarrow._fs.FileSystem.from_uri ??? > pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status ??? _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ > ??? E OSError: When resolving region for bucket > 'test-bucket': AWS Error [code 99]: curlCode: 35, SSL connect error > pyarrow/error.pxi:114: OSError > --------------------------------------------------------------------------------------------------------------------- > Captured stderr setup > ---------------------------------------------------------------------------------------------------------------------- > INFO:botocore.credentials:Found credentials in environment variables. > ----------------------------------------------------------------------------------------------------------------------- > Captured log setup > ----------------------------------------------------------------------------------------------------------------------- > INFO botocore.credentials:credentials.py:1114 Found credentials in > environment variables. > ==================================================================================================================== > short test summary info > ===================================================================================================================== > FAILED tests/custom_application/loading/test_load.py::test__pull_cached_data > - OSError: When resolving region for bucket 'test-bucket': AWS Error [code > 99]: curlCode: 35, SSL connect error > ======================================================================================================================= > 1 failed in 17.82s > ======================================================================================================================={code} > > (Trying to format shell output appropriately, please bear with me.) > Please let me know if you need additional information! -- This message was sent by Atlassian Jira (v8.20.7#820007)