Re: [PR] [#3379] feat(catalog-hadoop): Add S3 support for Fileset Hadoop catalog [gravitino]

via GitHub Thu, 15 Aug 2024 00:37:48 -0700


xloya commented on PR #4232:
URL: https://github.com/apache/gravitino/pull/4232#issuecomment-2290821704


   Hi all, let me share some actual production cases:
   Since I'm mainly responsible for the integration between Gravitino Fileset 
and internal engines and platforms, I did encounter some actual dependency 
conflicts problems during the launch process.
   1. On the server side: Our current supports both HDFS/MiFS (an internal 
storage that supports various object storages) in a Fileset Catalog. When 
introducing MiFS, the biggest problem encountered by the server during 
integration testing is that the MiFS dependency conflicts with some Grav 
dependencies. At this time, we have to take two approaches:
   a. Contact MiFS R&D to shade these dependencies (on the public cloud 
storage, I think this is not possible to do).
   b. During integration testing, exclude MiFS dependencies and cannot do MiFS 
integration testing.
   2. In GVFS client: Since it needs to be used in Spark, we have tried to add 
MiFS dependencies directly to GVFS, but this may still conflict with some 
dependencies of Spark. We have to shade these dependencies in GVFS, and some 
dependencies whatever cannot be shaded. Therefore, the solution we finally 
adopted is to make MiFS dependencies independent of GVFS client, and ask MiFS 
developers to shade these dependencies before introducing them separately in 
Spark.
   What I want to say is that with the increase of supported storages, 
dependency conflicts are inevitable, so on-demand loading may be a more 
reasonable approach. But I still hope that one Catalog can support multiple 
storage types, but the supported storage types can be determined by the 
maintainer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [#3379] feat(catalog-hadoop): Add S3 support for Fileset Hadoop catalog [gravitino]

Reply via email to