blackfaced opened a new issue, #8024:
URL: https://github.com/apache/gravitino/issues/8024
### Version
main branch
### Describe what's wrong
When debugging a fileset locally using the Python client as described in
Gravitino documentation, I found that the s3_endpoint option in the options
dictionary does not take effect. Even after setting s3_endpoint to an external
address, the client still tries to connect to the internal endpoint.
For example, I set:
PYTHON
options = {
"s3_endpoint": "https://external-s3-endpoint.com",
...
}
But the error message shows the client is still trying to connect to the
internal endpoint:
Connect timeout on endpoint URL: "https://internal-s3-endpoint.com/xxxxxx"
Please see the attached (redacted) error log for details.
### Error message and/or stacktrace
```python
.venv/lib/python3.9/site-packages/gravitino/filesystem/gvfs.py:169: in exists
result = decorated_exists(new_path, **kwargs)
.venv/lib/python3.9/site-packages/gravitino/filesystem/gvfs.py:104: in
wrapper
return func(*args, **kwargs)
.venv/lib/python3.9/site-packages/gravitino/filesystem/gvfs_default_operations.py:123:
in exists
return actual_fs.exists(
.venv/lib/python3.9/site-packages/fsspec/asyn.py:118: in wrapper
return sync(self.loop, func, *args, **kwargs)
.venv/lib/python3.9/site-packages/fsspec/asyn.py:103: in sync
raise return_result
.venv/lib/python3.9/site-packages/fsspec/asyn.py:56: in _runner
result[0] = await coro
.venv/lib/python3.9/site-packages/s3fs/core.py:1058: in _exists
await self._info(path, bucket, key, version_id=version_id)
.venv/lib/python3.9/site-packages/s3fs/core.py:1371: in _info
out = await self._call_s3(
.venv/lib/python3.9/site-packages/s3fs/core.py:362: in _call_s3
return await _error_wrapper(
.venv/lib/python3.9/site-packages/s3fs/core.py:142: in _error_wrapper
raise err
.venv/lib/python3.9/site-packages/s3fs/core.py:113: in _error_wrapper
return await func(*args, **kwargs)
.venv/lib/python3.9/site-packages/aiobotocore/context.py:36: in wrapper
return await func(*args, **kwargs)
.venv/lib/python3.9/site-packages/aiobotocore/client.py:406: in
_make_api_call
http, parsed_response = await self._make_request(
.venv/lib/python3.9/site-packages/aiobotocore/client.py:432: in _make_request
return await self._endpoint.make_request(
.venv/lib/python3.9/site-packages/aiobotocore/endpoint.py:120: in
_send_request
while await self._needs_retry(
.venv/lib/python3.9/site-packages/aiobotocore/endpoint.py:280: in
_needs_retry
responses = await self._event_emitter.emit(
.venv/lib/python3.9/site-packages/aiobotocore/hooks.py:68: in _emit
response = await resolve_awaitable(handler(**kwargs))
.venv/lib/python3.9/site-packages/aiobotocore/_helpers.py:6: in
resolve_awaitable
return await obj
.venv/lib/python3.9/site-packages/aiobotocore/retryhandler.py:107: in _call
if await resolve_awaitable(self._checker(**checker_kwargs)):
.venv/lib/python3.9/site-packages/aiobotocore/_helpers.py:6: in
resolve_awaitable
return await obj
.venv/lib/python3.9/site-packages/aiobotocore/retryhandler.py:126: in _call
should_retry = await self._should_retry(
.venv/lib/python3.9/site-packages/aiobotocore/retryhandler.py:165: in
_should_retry
return await resolve_awaitable(
.venv/lib/python3.9/site-packages/aiobotocore/_helpers.py:6: in
resolve_awaitable
return await obj
.venv/lib/python3.9/site-packages/aiobotocore/retryhandler.py:174: in _call
checker(attempt_number, response, caught_exception)
.venv/lib/python3.9/site-packages/botocore/retryhandler.py:247: in __call__
return self._check_caught_exception(
.venv/lib/python3.9/site-packages/botocore/retryhandler.py:416: in
_check_caught_exception
raise caught_exception
.venv/lib/python3.9/site-packages/aiobotocore/endpoint.py:201: in
_do_get_response
http_response = await self._send(request)
.venv/lib/python3.9/site-packages/aiobotocore/endpoint.py:303: in _send
return await self.http_session.send(request)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _
self = <aiobotocore.httpsession.AIOHTTPSession object at 0x114dcd8b0>
request = <AWSPreparedRequest stream_output=False, method=HEAD,
url=https://tos-s3-cn-beijing.ivolces.com/hc-dev/test_fileset_it...7a4398025f',
'amz-sdk-invocation-id': b'b6b522ec-0f7c-47bb-8c7d-bb8018289273',
'amz-sdk-request': b'attempt=5; max=5'}>
async def send(self, request):
try:
proxy_url = self._proxy_config.proxy_url_for(request.url)
proxy_headers = self._proxy_config.proxy_headers_for(request.url)
url = request.url
headers = request.headers
data = request.body
if ensure_boolean(
os.environ.get('BOTO_EXPERIMENTAL__ADD_PROXY_HOST_HEADER',
'')
):
# This is currently an "experimental" feature which provides
# no guarantees of backwards compatibility. It may be subject
# to change or removal in any patch version. Anyone opting in
# to this feature should strictly pin botocore.
host = urlparse(request.url).hostname
proxy_headers['host'] = host
headers_ = CIMultiDict(
(z[0], _text(z[1], encoding='utf-8')) for z in
headers.items()
)
# https://github.com/boto/botocore/issues/1255
headers_['Accept-Encoding'] = 'identity'
if isinstance(data, io.IOBase):
data = _IOBaseWrapper(data)
url = URL(url, encoded=True)
session = await self._get_session(proxy_url)
response = await session.request(
request.method,
url=url,
chunked=self._chunked(headers_),
headers=headers_,
data=data,
proxy=proxy_url,
proxy_headers=proxy_headers,
)
# botocore converts keys to str, so make sure that they are in
# the expected case. See detailed discussion here:
# https://github.com/aio-libs/aiobotocore/pull/116
# aiohttp's CIMultiDict camel cases the headers :(
headers = {
k.decode('utf-8').lower(): v.decode('utf-8')
for k, v in response.raw_headers
}
http_response = aiobotocore.awsrequest.AioAWSResponse(
str(response.url), response.status, headers, response
)
if not request.stream_output:
# Cause the raw stream to be exhausted immediately. We do it
# this way instead of using preload_content because
# preload_content will never buffer chunked responses
await http_response.content
return http_response
except ClientSSLError as e:
raise SSLError(endpoint_url=request.url, error=e)
except (ClientProxyConnectionError, ClientHttpProxyError) as e:
raise ProxyConnectionError(
proxy_url=mask_proxy_url(proxy_url), error=e
)
except (
ServerDisconnectedError,
aiohttp.ClientPayloadError,
aiohttp.http_exceptions.BadStatusLine,
) as e:
raise ConnectionClosedError(
error=e, request=request, endpoint_url=request.url
)
except ServerTimeoutError as e:
if str(e).lower().startswith('connect'):
> raise ConnectTimeoutError(endpoint_url=request.url, error=e)
E botocore.exceptions.ConnectTimeoutError: Connect timeout on
endpoint URL:
"https://tos-s3-cn-beijing.ivolces.com/xxx/test_fileset_it/test_fileset_schema/test_fileset"
.venv/lib/python3.9/site-packages/aiobotocore/httpsession.py:268:
ConnectTimeoutError
```
### How to reproduce
How to reproduce
Gravitino version: 0.9.1
Python version: 3.9.6
Environment: Local MacOS, Python virtual environment
Steps:
Follow the Gravitino documentation to set up and debug a fileset using the
Python client.
In the options dictionary, set s3_endpoint to an external S3 address.
Run the test, e.g. pytest -v test_fileset.py
Observe that the client still attempts to connect to the internal S3
endpoint, resulting in a connection timeout.
### Additional context
It seems that the s3_endpoint option in the Python client is not being
respected, and the internal endpoint is used regardless of the value set in
options. Has anyone encountered this issue or know how to force the client to
use the specified external endpoint?
If more environment details or configuration files are needed, please let me
know.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]