xloya commented on code in PR #5209:
URL: https://github.com/apache/gravitino/pull/5209#discussion_r1810024498
##########
clients/client-python/gravitino/filesystem/gvfs.py:
##########
@@ -49,6 +49,8 @@ class StorageType(Enum):
HDFS = "hdfs"
LOCAL = "file"
GCS = "gs"
+ S3A = "s3a"
+ S3 = "s3"
Review Comment:
I have the same question, because we only use the s3a scheme in the
`S3FileSystemProvider`(https://github.com/apache/gravitino/blob/main/bundles/aws-bundle/src/main/java/org/apache/gravitino/s3/fs/S3FileSystemProvider.java#L44),
is there any case will use the s3 scheme?
##########
clients/client-python/gravitino/filesystem/gvfs.py:
##########
@@ -819,5 +840,40 @@ def _get_gcs_filesystem(self):
return importlib.import_module("pyarrow.fs").GcsFileSystem()
+ def _get_s3_filesystem(self):
+ # get All keys from the options that start with 'gravitino.bypass.s3.'
and remove the prefix
+ s3_options = {
+ key[len(GVFSConfig.GVFS_FILESYSTEM_BY_PASS_S3) :]: value
+ for key, value in self._options.items()
+ if key.startswith(GVFSConfig.GVFS_FILESYSTEM_BY_PASS_S3)
+ }
+
+ # get 'aws_access_key_id' from s3_options, if the key is not found,
throw an exception
+ aws_access_key_id =
s3_options.get(GVFSConfig.GVFS_FILESYSTEM_S3_ACCESS_KEY)
+ if aws_access_key_id is None:
+ raise GravitinoRuntimeException(
+ "AWS access key id is not found in the options."
+ )
+
+ # get 'aws_secret_access_key' from s3_options, if the key is not
found, throw an exception
+ aws_secret_access_key =
s3_options.get(GVFSConfig.GVFS_FILESYSTEM_S3_SECRET_KEY)
+ if aws_secret_access_key is None:
+ raise GravitinoRuntimeException(
+ "AWS secret access key is not found in the options."
+ )
+
+ # get 'aws_endpoint_url' from s3_options, if the key is not found,
throw an exception
+ aws_endpoint_url =
s3_options.get(GVFSConfig.GVFS_FILESYSTEM_S3_ENDPOINT)
+ if aws_endpoint_url is None:
+ raise GravitinoRuntimeException(
+ "AWS endpoint url is not found in the options."
+ )
+
+ return importlib.import_module("pyarrow.fs").S3FileSystem(
Review Comment:
Sorry I didn't notice this before, GCS and S3 also have the fsspec
implementation(https://github.com/fsspec/gcsfs,
https://github.com/fsspec/s3fs), how do you consider the selection here to use
PyArrow's implementation?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]