[ https://issues.apache.org/jira/browse/KYLIN-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331931#comment-17331931 ]
Xiaoxiang Yu commented on KYLIN-4990: ------------------------------------- Hello [~linlin994395], it is quite complex situation, could you contact me via wechat, so we can have a direct discussion. > 使用hive构建全局字典表指定MR读取具体位置的文件而不是hdfs根目录 > ------------------------------------ > > Key: KYLIN-4990 > URL: https://issues.apache.org/jira/browse/KYLIN-4990 > Project: Kylin > Issue Type: Bug > Components: Job Engine > Affects Versions: v3.1.1 > Reporter: xue lin > Priority: Major > Attachments: s3-hive-全局字典表.png > > > 我参考了如下文档在涉及到bitmap时构建hive全局字典表 > [http://kylin.apache.org/cn/docs/howto/howto_use_hive_mr_dict.html] > [https://cwiki.apache.org/confluence/display/KYLIN/Introduction+to+Hive+Global+Dictionary] > https://issues.apache.org/jira/browse/KYLIN-4616 > 理想状况下,希望将表都放在S3上,当今天如下配置时 > ----------------------- > # kylin_hive_conf.xml > <property> > <name>hive.metastore.warehouse.dir</name> > <value>s3://etl-script-product/hive-kylin-dict</value> > <description>location of default database for the warehouse</description> > </property> > ----------------------- > S3上表存储情况见附件 > 但当kylin进行到Build Hive Global Dict - parallel part build,报错如下 > --------------------------- > org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does > not exist: > hdfs://ip-10-50-69-202.eu-west-1.compute.internal:8020/kylin_intermediate_cube_fact_remain_dc1531fe_0197_4ab1_a2d5_fe6d6629bb09_distinct_value/dict_column=VIEW_FACT_REMAIN_ID > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:271) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:358) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:303) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338) > at > org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:198) > at > org.apache.kylin.engine.mr.steps.BuildGlobalHiveDictPartBuildJob.run(BuildGlobalHiveDictPartBuildJob.java:109) > at > org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:155) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > --------------------------- > 当把hive.metastore.warehouse.dir参数调整成如下时能绕过去 > ----------------------- > # kylin_hive_conf.xml > <property> > <name>hive.metastore.warehouse.dir</name> > <value>/</value> > <description>location of default database for the warehouse</description> > </property> > ----------------------- > 有没有参数能够更改Build Hive Global Dict - parallel part build时MR读取文件的路径? -- This message was sent by Atlassian Jira (v8.3.4#803005)