[Proposal] Big data component configuration and environment isolation

liu hu Mon, 27 Dec 2021 23:41:24 -0800

hi guys,
     I suggest isolating the configuration and environment of hive, Kerberos, 
HDFS / S3, spark and Flink to improve the user experience and scalability of 
scheduling tasks. Better support task on yarn / k8s environment., At present, 
dolphin scheduler has strong coupling with hive and Hadoop, such as data source 
hive, Kerberos authentication, storage HDFS / S3, big data task spark, Flink, 
etc.Their configurations need to be written in the configuration file in 
advance, and the dependent environment also needs to be loaded in advance to 
prevent dependency conflicts.
    I plan to externalize the configuration of hive and Kerberos and isolate 
the environment in the first step. See the issues / PR for details
    https://github.com/apache/dolphinscheduler/issues/7623
    https://github.com/apache/dolphinscheduler/pull/7624
     Then, the storage part of HDFS / S3 is transformed, and finally the most 
important k8s cluster configuration, Flink and spark configuration.


Narcasserun

[Proposal] Big data component configuration and environment isolation

Reply via email to