[jira] [Commented] (KYLIN-4990) 使用hive构建全局字典表指定MR读取具体位置的文件而不是hdfs根目录

Xiaoxiang Yu (Jira) Mon, 26 Apr 2021 02:29:06 -0700


    [ 
https://issues.apache.org/jira/browse/KYLIN-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331931#comment-17331931
 ]


Xiaoxiang Yu commented on KYLIN-4990:
-------------------------------------

Hello [~linlin994395], it is quite complex situation, could you contact me via 
wechat, so we can have a direct discussion. 

> 使用hive构建全局字典表指定MR读取具体位置的文件而不是hdfs根目录
> ------------------------------------
>
>                 Key: KYLIN-4990
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4990
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: v3.1.1
>            Reporter: xue lin
>            Priority: Major
>         Attachments: s3-hive-全局字典表.png
>
>
> 我参考了如下文档在涉及到bitmap时构建hive全局字典表
> [http://kylin.apache.org/cn/docs/howto/howto_use_hive_mr_dict.html]
> [https://cwiki.apache.org/confluence/display/KYLIN/Introduction+to+Hive+Global+Dictionary]
> https://issues.apache.org/jira/browse/KYLIN-4616
> 理想状况下，希望将表都放在S3上，当今天如下配置时
> -----------------------
> # kylin_hive_conf.xml
> <property>
>  <name>hive.metastore.warehouse.dir</name>
>  <value>s3://etl-script-product/hive-kylin-dict</value>
>  <description>location of default database for the warehouse</description>
> </property>
> -----------------------
> S3上表存储情况见附件
> 但当kylin进行到Build Hive Global Dict - parallel part build，报错如下
> ---------------------------
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
> not exist: 
> hdfs://ip-10-50-69-202.eu-west-1.compute.internal:8020/kylin_intermediate_cube_fact_remain_dc1531fe_0197_4ab1_a2d5_fe6d6629bb09_distinct_value/dict_column=VIEW_FACT_REMAIN_ID
>  at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
>  at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:271)
>  at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:358)
>  at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:303)
>  at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
>  at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198)
>  at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>  at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>  at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
>  at 
> org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:198)
>  at 
> org.apache.kylin.engine.mr.steps.BuildGlobalHiveDictPartBuildJob.run(BuildGlobalHiveDictPartBuildJob.java:109)
>  at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:155)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
>  at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
>  at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> ---------------------------
> 当把hive.metastore.warehouse.dir参数调整成如下时能绕过去
> -----------------------
> # kylin_hive_conf.xml
> <property>
>  <name>hive.metastore.warehouse.dir</name>
>  <value>/</value>
>  <description>location of default database for the warehouse</description>
> </property>
> -----------------------
> 有没有参数能够更改Build Hive Global Dict - parallel part build时MR读取文件的路径？



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KYLIN-4990) 使用hive构建全局字典表指定MR读取具体位置的文件而不是hdfs根目录

Reply via email to