After I applied the patch on 0.7 staging (commit 388488be80d24ecb4d3209726c90a1b829a18087), CI failed to due to FactDistinctColumnsJob fail. The error message is https://gist.github.com/binmahone/9a6d8bdff37d94f59dea.
I checked the job's configuration at job history page, seems the property "fact.dict.column.rowkey.indexes" and "fact.dict.column.names" not successfully set into job's configuration. I'm looking into the issue, any comments is welcomed. On Mon, Aug 3, 2015 at 9:32 PM, Dayue Gao (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/KYLIN-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651856#comment-14651856 > ] > > Dayue Gao commented on KYLIN-921: > --------------------------------- > > Thanks Hongbin, Shaofeng! Glad to know it has been fixed in 0.7.3 :-) > > > > Dimension with all nulls cause BuildDimensionDictionary failed due to > FileNotFoundException > > > ------------------------------------------------------------------------------------------- > > > > Key: KYLIN-921 > > URL: https://issues.apache.org/jira/browse/KYLIN-921 > > Project: Kylin > > Issue Type: Bug > > Components: Job Engine > > Affects Versions: v0.7.2 > > Reporter: Dayue Gao > > Assignee: ZhouQianhao > > Fix For: v0.7.3 > > > > Attachments: KYLIN-921.patch > > > > > > From mailing list > > ---------------------- > > {noformat} > > I am building a cube with some lookup table in between and getting > > exception at third step of cube build i.e Build Dimension Dictionary with > > exception saying > > java.io.FileNotFoundException: File does not exist: > > > /tmp/kylin-5a2ea405-24a2-45ed-958e-2a7fddd8cc97/sc_o2s_metrics_verified123455/fact_distinct_columns/SC > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093) > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) > > at > > > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) > > at org.apache.kylin.dict.lookup.FileTable.getSignature(FileTable.java:62) > > at > > > org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164) > > at > org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154) > > at > > > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53) > > at > > > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > > at > > > org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > > at > > > org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63) > > at > > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > > at > > > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > > at > > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > > at > > > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > {noformat} > > The problem is that FactDistinctColumnsMapper's map method skips null > values. As a result, if all values of dimension 'x' are null, > FactDistinctColumnsReducer will not create file for 'x', thereafter the > following job throws FileNotFoundException. > > {code:title=FactDistinctColumnsMapper.java|borderStyle=solid} > > public void map(KEYIN key, HCatRecord record, Context context) throws > IOException, InterruptedException { > > try { > > // code ommited ... > > for (int i : factDictCols) { > > outputKey.set((short) i); > > fieldSchema = schema.get(flatTableIndexes[i]); > > Object fieldValue = record.get(fieldSchema.getName(), > schema); > > // NULL VALUE IS SKIPPED > > if (fieldValue == null) > > continue; > > // code ommited ... > > } > > } catch (Exception ex) { > > handleErrorRecord(record, ex); > > } > > } > > {code} > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
