[
https://issues.apache.org/jira/browse/ARROW-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Antoine Pitrou closed ARROW-16437.
----------------------------------
Resolution: Won't Fix
This is not something that can possibly be fixed, as our S3 filesystem is
implemented in C++.
> [Python] Mocking S3 tests with moto not currently feasible
> ----------------------------------------------------------
>
> Key: ARROW-16437
> URL: https://issues.apache.org/jira/browse/ARROW-16437
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Affects Versions: 7.0.0
> Environment: Ubuntu environment
> Python 3.9
> PyArrow 7.0.0
> moto 3.1.7
> Reporter: Timothy Luna
> Priority: Major
>
> h2. Unable to use moto to mock S3 for testing purposes.
>
> I've been using AWSWrangler as a loading utility in a custom application and
> am attempting to remove it as a dependency because PyArrow Dataset is capable
> of providing all the s3 functionality I need.
>
> The issue stems from the fact that when PyArrow attempts to determine the
> FileSystem type it appears to be sidestepping moto and is failing with:
>
> {code:java}
> root@996521aaba72:/workspaces/custom_application# python -m pytest
> tests/custom_utilities/loading/test_load.py::test__pull_cached_data
> ======================================================================================================================
> test session starts
> =======================================================================================================================
> platform linux -- Python 3.9.10, pytest-7.1.2, pluggy-1.0.0
> rootdir: /workspaces/custom_application, configfile: tox.ini
> plugins: hypothesis-6.45.0, cov-3.0.0
> collected 1 item
>
>
> tests/custom_utilities/loading/test_load.py F
>
>
>
> [100%]============================================================================================================================
> FAILURES
> ============================================================================================================================
> _____________________________________________________________________________________________________________________
> test__pull_cached_data
> _____________________________________________________________________________________________________________________
> @pytest.mark.usefixtures("s3")
> def test__pull_cached_data():
> """Tests pull cached data, both happy and sad."""
> # Here we're going to make a folder, transfer in some files,
> # and pull them!
> with tempfile.TemporaryDirectory() as t:
> # This works fine
> # from awswrangler.s3 import read_parquet
> # sillything =
> read_parquet('s3://test-bucket/test_metadata.parquet')
> > a =
> > ds.dataset('s3://test-bucket/test_metadata.parquet')tests/custom_utilities/loading/test_load.py:156:
> >
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _
> /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:667: in dataset
> return _filesystem_dataset(source, **kwargs)
> /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:412: in
> _filesystem_dataset
> fs, paths_or_selector = _ensure_single_source(source, filesystem)
> /usr/local/lib/python3.9/site-packages/pyarrow/dataset.py:373: in
> _ensure_single_source
> filesystem, path = _resolve_filesystem_and_path(path, filesystem)
> /usr/local/lib/python3.9/site-packages/pyarrow/fs.py:179: in
> _resolve_filesystem_and_path
> filesystem, path = FileSystem.from_uri(path)
> pyarrow/_fs.pyx:350: in pyarrow._fs.FileSystem.from_uri
> ???
> pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
> ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ > ???
> E OSError: When resolving region for bucket 'test-bucket': AWS Error [code
> 99]: curlCode: 35, SSL connect errorpyarrow/error.pxi:114: OSError
> ---------------------------------------------------------------------------------------------------------------------
> Captured stderr setup
> ----------------------------------------------------------------------------------------------------------------------
> INFO:botocore.credentials:Found credentials in environment variables.
> -----------------------------------------------------------------------------------------------------------------------
> Captured log setup
> -----------------------------------------------------------------------------------------------------------------------
> INFO botocore.credentials:credentials.py:1114 Found credentials in
> environment variables.
> ====================================================================================================================
> short test summary info
> =====================================================================================================================
> FAILED tests/custom_utilities/loading/test_load.py::test__pull_cached_data -
> OSError: When resolving region for bucket 'test-bucket': AWS Error [code 99]:
> curlCode: 35, SSL connect error
> =======================================================================================================================
> 1 failed in 17.69s
> =======================================================================================================================
> root@996521aaba72:/workspaces/custom_application# {code}
>
> If someone is willing to tell me what I should be doing to appropriately
> format the shell output above, please let me know, but I think the takeaway
> is `OSError: When resolving region for bucket 'test-bucket': AWS Error [code
> 99]: curlCode: 35, SSL connect error`.
> Please let me know if you need additional information!
--
This message was sent by Atlassian Jira
(v8.20.7#820007)