xloya commented on PR #4232: URL: https://github.com/apache/gravitino/pull/4232#issuecomment-2290821704
Hi all, let me share some actual production cases: Since I'm mainly responsible for the integration between Gravitino Fileset and internal engines and platforms, I did encounter some actual dependency conflicts problems during the launch process. 1. On the server side: Our current supports both HDFS/MiFS (an internal storage that supports various object storages) in a Fileset Catalog. When introducing MiFS, the biggest problem encountered by the server during integration testing is that the MiFS dependency conflicts with some Grav dependencies. At this time, we have to take two approaches: a. Contact MiFS R&D to shade these dependencies (on the public cloud storage, I think this is not possible to do). b. During integration testing, exclude MiFS dependencies and cannot do MiFS integration testing. 2. In GVFS client: Since it needs to be used in Spark, we have tried to add MiFS dependencies directly to GVFS, but this may still conflict with some dependencies of Spark. We have to shade these dependencies in GVFS, and some dependencies whatever cannot be shaded. Therefore, the solution we finally adopted is to make MiFS dependencies independent of GVFS client, and ask MiFS developers to shade these dependencies before introducing them separately in Spark. What I want to say is that with the increase of supported storages, dependency conflicts are inevitable, so on-demand loading may be a more reasonable approach. But I still hope that one Catalog can support multiple storage types, but the supported storage types can be determined by the maintainer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
