[
https://issues.apache.org/jira/browse/KYLIN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shao Feng Shi resolved KYLIN-4320.
----------------------------------
Resolution: Fixed
> number of replicas of Cuboid files cannot be configured for Spark engine
> ------------------------------------------------------------------------
>
> Key: KYLIN-4320
> URL: https://issues.apache.org/jira/browse/KYLIN-4320
> Project: Kylin
> Issue Type: Bug
> Components: Job Engine
> Affects Versions: v3.0.1
> Reporter: Congling Xia
> Assignee: Yaqian Zhang
> Priority: Major
> Fix For: v3.1.0
>
> Attachments: cuboid_replications.png
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> Hi, team. I try to change `dfs.replication` to 3 by adding the following
> config override
> {code:java}
> kylin.engine.spark-conf.spark.hadoop.dfs.replication=3
> {code}
> Then, I get a strange result - numbers of replicas of cuboid files varies
> even though they are in the same level.
> !cuboid_replications.png!
> I guess it is due to the conflicting settings in SparkUtil:
> {code:java}
> public static void modifySparkHadoopConfiguration(SparkContext sc) throws
> Exception {
> sc.hadoopConfiguration().set("dfs.replication", "2"); // cuboid
> intermediate files, replication=2
>
> sc.hadoopConfiguration().set("mapreduce.output.fileoutputformat.compress",
> "true");
>
> sc.hadoopConfiguration().set("mapreduce.output.fileoutputformat.compress.type",
> "BLOCK");
>
> sc.hadoopConfiguration().set("mapreduce.output.fileoutputformat.compress.codec",
> "org.apache.hadoop.io.compress.DefaultCodec"); // or
> org.apache.hadoop.io.compress.SnappyCodec
> }
> {code}
> It may be a bug for Spark property precedence. After checking [Spark
> documents|#dynamically-loading-spark-properties]], it seems that some
> programmatically set properties may not take effect and it is not a
> recommended way for Spark job configuration.
>
> Anyway, cuboid files may survive for weeks until expired or been merged, the
> configuration rewrite in
> `org.apache.kylin.engine.spark.SparkUtil#modifySparkHadoopConfiguration`
> makes those files less reliable.
> Is there any way to force cuboid files to remain 3 replicas? or shall we
> remove the code in SparkUtil to make
> kylin.engine.spark-conf.spark.hadoop.dfs.replication work properly?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)