blackfaced opened a new issue, #8024:
URL: https://github.com/apache/gravitino/issues/8024

   ### Version
   
   main branch
   
   ### Describe what's wrong
   
   When debugging a fileset locally using the Python client as described in 
Gravitino documentation, I found that the s3_endpoint option in the options 
dictionary does not take effect. Even after setting s3_endpoint to an external 
address, the client still tries to connect to the internal endpoint.
   For example, I set:
   PYTHON
   options = {
       "s3_endpoint": "https://external-s3-endpoint.com";,
       ...
   }
   But the error message shows the client is still trying to connect to the 
internal endpoint:
   Connect timeout on endpoint URL: "https://internal-s3-endpoint.com/xxxxxx";
   Please see the attached (redacted) error log for details.
   
   ### Error message and/or stacktrace
   
   ```python
   .venv/lib/python3.9/site-packages/gravitino/filesystem/gvfs.py:169: in exists
       result = decorated_exists(new_path, **kwargs)
   .venv/lib/python3.9/site-packages/gravitino/filesystem/gvfs.py:104: in 
wrapper
       return func(*args, **kwargs)
   
.venv/lib/python3.9/site-packages/gravitino/filesystem/gvfs_default_operations.py:123:
 in exists
       return actual_fs.exists(
   .venv/lib/python3.9/site-packages/fsspec/asyn.py:118: in wrapper
       return sync(self.loop, func, *args, **kwargs)
   .venv/lib/python3.9/site-packages/fsspec/asyn.py:103: in sync
       raise return_result
   .venv/lib/python3.9/site-packages/fsspec/asyn.py:56: in _runner
       result[0] = await coro
   .venv/lib/python3.9/site-packages/s3fs/core.py:1058: in _exists
       await self._info(path, bucket, key, version_id=version_id)
   .venv/lib/python3.9/site-packages/s3fs/core.py:1371: in _info
       out = await self._call_s3(
   .venv/lib/python3.9/site-packages/s3fs/core.py:362: in _call_s3
       return await _error_wrapper(
   .venv/lib/python3.9/site-packages/s3fs/core.py:142: in _error_wrapper
       raise err
   .venv/lib/python3.9/site-packages/s3fs/core.py:113: in _error_wrapper
       return await func(*args, **kwargs)
   .venv/lib/python3.9/site-packages/aiobotocore/context.py:36: in wrapper
       return await func(*args, **kwargs)
   .venv/lib/python3.9/site-packages/aiobotocore/client.py:406: in 
_make_api_call
       http, parsed_response = await self._make_request(
   .venv/lib/python3.9/site-packages/aiobotocore/client.py:432: in _make_request
       return await self._endpoint.make_request(
   .venv/lib/python3.9/site-packages/aiobotocore/endpoint.py:120: in 
_send_request
       while await self._needs_retry(
   .venv/lib/python3.9/site-packages/aiobotocore/endpoint.py:280: in 
_needs_retry
       responses = await self._event_emitter.emit(
   .venv/lib/python3.9/site-packages/aiobotocore/hooks.py:68: in _emit
       response = await resolve_awaitable(handler(**kwargs))
   .venv/lib/python3.9/site-packages/aiobotocore/_helpers.py:6: in 
resolve_awaitable
       return await obj
   .venv/lib/python3.9/site-packages/aiobotocore/retryhandler.py:107: in _call
       if await resolve_awaitable(self._checker(**checker_kwargs)):
   .venv/lib/python3.9/site-packages/aiobotocore/_helpers.py:6: in 
resolve_awaitable
       return await obj
   .venv/lib/python3.9/site-packages/aiobotocore/retryhandler.py:126: in _call
       should_retry = await self._should_retry(
   .venv/lib/python3.9/site-packages/aiobotocore/retryhandler.py:165: in 
_should_retry
       return await resolve_awaitable(
   .venv/lib/python3.9/site-packages/aiobotocore/_helpers.py:6: in 
resolve_awaitable
       return await obj
   .venv/lib/python3.9/site-packages/aiobotocore/retryhandler.py:174: in _call
       checker(attempt_number, response, caught_exception)
   .venv/lib/python3.9/site-packages/botocore/retryhandler.py:247: in __call__
       return self._check_caught_exception(
   .venv/lib/python3.9/site-packages/botocore/retryhandler.py:416: in 
_check_caught_exception
       raise caught_exception
   .venv/lib/python3.9/site-packages/aiobotocore/endpoint.py:201: in 
_do_get_response
       http_response = await self._send(request)
   .venv/lib/python3.9/site-packages/aiobotocore/endpoint.py:303: in _send
       return await self.http_session.send(request)
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
   
   self = <aiobotocore.httpsession.AIOHTTPSession object at 0x114dcd8b0>
   request = <AWSPreparedRequest stream_output=False, method=HEAD, 
url=https://tos-s3-cn-beijing.ivolces.com/hc-dev/test_fileset_it...7a4398025f', 
'amz-sdk-invocation-id': b'b6b522ec-0f7c-47bb-8c7d-bb8018289273', 
'amz-sdk-request': b'attempt=5; max=5'}>
   
       async def send(self, request):
           try:
               proxy_url = self._proxy_config.proxy_url_for(request.url)
               proxy_headers = self._proxy_config.proxy_headers_for(request.url)
               url = request.url
               headers = request.headers
               data = request.body
       
               if ensure_boolean(
                   os.environ.get('BOTO_EXPERIMENTAL__ADD_PROXY_HOST_HEADER', 
'')
               ):
                   # This is currently an "experimental" feature which provides
                   # no guarantees of backwards compatibility. It may be subject
                   # to change or removal in any patch version. Anyone opting in
                   # to this feature should strictly pin botocore.
                   host = urlparse(request.url).hostname
                   proxy_headers['host'] = host
       
               headers_ = CIMultiDict(
                   (z[0], _text(z[1], encoding='utf-8')) for z in 
headers.items()
               )
       
               # https://github.com/boto/botocore/issues/1255
               headers_['Accept-Encoding'] = 'identity'
       
               if isinstance(data, io.IOBase):
                   data = _IOBaseWrapper(data)
       
               url = URL(url, encoded=True)
               session = await self._get_session(proxy_url)
               response = await session.request(
                   request.method,
                   url=url,
                   chunked=self._chunked(headers_),
                   headers=headers_,
                   data=data,
                   proxy=proxy_url,
                   proxy_headers=proxy_headers,
               )
       
               # botocore converts keys to str, so make sure that they are in
               # the expected case. See detailed discussion here:
               # https://github.com/aio-libs/aiobotocore/pull/116
               # aiohttp's CIMultiDict camel cases the headers :(
               headers = {
                   k.decode('utf-8').lower(): v.decode('utf-8')
                   for k, v in response.raw_headers
               }
       
               http_response = aiobotocore.awsrequest.AioAWSResponse(
                   str(response.url), response.status, headers, response
               )
       
               if not request.stream_output:
                   # Cause the raw stream to be exhausted immediately. We do it
                   # this way instead of using preload_content because
                   # preload_content will never buffer chunked responses
                   await http_response.content
       
               return http_response
           except ClientSSLError as e:
               raise SSLError(endpoint_url=request.url, error=e)
           except (ClientProxyConnectionError, ClientHttpProxyError) as e:
               raise ProxyConnectionError(
                   proxy_url=mask_proxy_url(proxy_url), error=e
               )
           except (
               ServerDisconnectedError,
               aiohttp.ClientPayloadError,
               aiohttp.http_exceptions.BadStatusLine,
           ) as e:
               raise ConnectionClosedError(
                   error=e, request=request, endpoint_url=request.url
               )
           except ServerTimeoutError as e:
               if str(e).lower().startswith('connect'):
   >               raise ConnectTimeoutError(endpoint_url=request.url, error=e)
   E               botocore.exceptions.ConnectTimeoutError: Connect timeout on 
endpoint URL: 
"https://tos-s3-cn-beijing.ivolces.com/xxx/test_fileset_it/test_fileset_schema/test_fileset";
   
   .venv/lib/python3.9/site-packages/aiobotocore/httpsession.py:268: 
ConnectTimeoutError
   ```
   
   ### How to reproduce
   
   How to reproduce
   Gravitino version: 0.9.1
   Python version: 3.9.6
   Environment: Local MacOS, Python virtual environment
   Steps:
   Follow the Gravitino documentation to set up and debug a fileset using the 
Python client.
   In the options dictionary, set s3_endpoint to an external S3 address.
   Run the test, e.g. pytest -v test_fileset.py
   Observe that the client still attempts to connect to the internal S3 
endpoint, resulting in a connection timeout.
   
   ### Additional context
   
   It seems that the s3_endpoint option in the Python client is not being 
respected, and the internal endpoint is used regardless of the value set in 
options. Has anyone encountered this issue or know how to force the client to 
use the specified external endpoint?
   If more environment details or configuration files are needed, please let me 
know.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to