Hi Yiming, The "mapreduce.job.reduces" need by set at runtime, whose number is calculated based on user tables' size, it couldn't be pre-configured.
The "hive.merge.mapredfiles=false" can be externalized to the conf file; The hive merge is not needed since 1.5.3, I set in code to ensure it will be not be enabled (config files before 1.5.3 has this param set to true). For other parameters, I think they're optional, but it is better to keep as they're good for performance, like dfs.replication=2, compress.codec etc. Usually in a hadoop cluster, Apache Kylin should be treated as a priviledged user (instead of a normal user like analyst), which can execute necessary hadoop/hdfs/hbase/hive actions (like mkdir, create htable, etc); To achieve this, the administartor need do some configurations and authorizations; What we need do is to compose a document to list these privileges, what's your opinion? Thanks for the comment! 2016-07-30 14:03 GMT+08:00 Yiming Liu <[email protected]>: > Hi Kylin dev, > > The first step is building cube is to CreateFlatHiveTable, it will call a > few hive configuration commands, such as > CreateFlatHiveTableStep line 78 and 79. > set mapreduce.job.reduces=numReduces > set hive.merge.mapredfiles=false > > Are these commands necessary for the cube building? Could we configure them > in files? I met some cases, where the hiveserver would say "Configuration > is not allowed to modify at runtime". It will break the build. > > Maybe there are some other hard code hadoop commands still. It will be more > friendly if they could turn off on demand. > > -- > With Warm regards > > Yiming Liu (刘一鸣) > -- Best regards, Shaofeng Shi
