Our hadoop cluster has multiple YARN execution queues for running Hadoop
jobs(like MR, SPARK) at different resource capacity.

 

But the current implementation of IntermediateHiveTableStep doesn't have
option for users to specify the YARN queue, 

which basically runs the "hive -e" command in the *DEFAULT* queue.
Unfortunately, *DEFAULT* queue might not have enough resource configured.

 

I think it would be great to allow user specify the running queue for KYLIN
jobs, and as far as I know it can be accomplished easily:

1. In kylin.properties, specify the MR arugment like
"kylin.job.cmd.extra.args=-D mapreduce.job.queuename=your_yarn_queue"

2. Modify the KylinConfig to add an option of YARN queue

3. Modify the createIntermediateHiveTableStep method of AbstractJobBuilder
to append "SET mapreduce.job.queuename=your_yarn_queue" to the "hive -e"
command

For step 2 & 3, it only needs a little bit of coding. 

 

I am not sure if the above approach is the best way of doing it, so I would
like to hear the opinions from KYLIN community.

 


Thanks,

Hua

Reply via email to