hi guys,
I suggest isolating the configuration and environment of hive, Kerberos,
HDFS / S3, spark and Flink to improve the user experience and scalability of
scheduling tasks. Better support task on yarn / k8s environment., At present,
dolphin scheduler has strong coupling with hive and Hadoop, such as data source
hive, Kerberos authentication, storage HDFS / S3, big data task spark, Flink,
etc.Their configurations need to be written in the configuration file in
advance, and the dependent environment also needs to be loaded in advance to
prevent dependency conflicts.
I plan to externalize the configuration of hive and Kerberos and isolate
the environment in the first step. See the issues / PR for details
https://github.com/apache/dolphinscheduler/issues/7623
https://github.com/apache/dolphinscheduler/pull/7624
Then, the storage part of HDFS / S3 is transformed, and finally the most
important k8s cluster configuration, Flink and spark configuration.
Narcasserun