KYLIN-921 is irrelevant to empty segments. It's to deal with cases where some dimension are always null (while some other dimension being solid values). Will https://issues.apache.org/jira/browse/KYLIN-863 handle the case?
On Mon, Aug 3, 2015 at 3:04 PM, Shaofeng SHI (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/KYLIN-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651526#comment-14651526 > ] > > Shaofeng SHI commented on KYLIN-921: > ------------------------------------ > > Hi Dayue and hongbin, in 0.7.3 I already made the change to allow empty > cube segment, that means even if a dimension is null, the job will not fail > in the third step; the change was already included in KYLIN-863 and made in > both 0.7 and 0.8; > > > Dimension with all nulls cause BuildDimensionDictionary failed due to > FileNotFoundException > > > ------------------------------------------------------------------------------------------- > > > > Key: KYLIN-921 > > URL: https://issues.apache.org/jira/browse/KYLIN-921 > > Project: Kylin > > Issue Type: Bug > > Components: Job Engine > > Affects Versions: v0.7.2 > > Reporter: Dayue Gao > > Assignee: ZhouQianhao > > Fix For: v0.7.3 > > > > Attachments: KYLIN-921.patch > > > > > > From mailing list > > ---------------------- > > {noformat} > > I am building a cube with some lookup table in between and getting > > exception at third step of cube build i.e Build Dimension Dictionary with > > exception saying > > java.io.FileNotFoundException: File does not exist: > > > /tmp/kylin-5a2ea405-24a2-45ed-958e-2a7fddd8cc97/sc_o2s_metrics_verified123455/fact_distinct_columns/SC > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093) > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) > > at > > > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) > > at org.apache.kylin.dict.lookup.FileTable.getSignature(FileTable.java:62) > > at > > > org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164) > > at > org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154) > > at > > > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53) > > at > > > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > > at > > > org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > > at > > > org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63) > > at > > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > > at > > > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > > at > > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) > > at > > > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > {noformat} > > The problem is that FactDistinctColumnsMapper's map method skips null > values. As a result, if all values of dimension 'x' are null, > FactDistinctColumnsReducer will not create file for 'x', thereafter the > following job throws FileNotFoundException. > > {code:title=FactDistinctColumnsMapper.java|borderStyle=solid} > > public void map(KEYIN key, HCatRecord record, Context context) throws > IOException, InterruptedException { > > try { > > // code ommited ... > > for (int i : factDictCols) { > > outputKey.set((short) i); > > fieldSchema = schema.get(flatTableIndexes[i]); > > Object fieldValue = record.get(fieldSchema.getName(), > schema); > > // NULL VALUE IS SKIPPED > > if (fieldValue == null) > > continue; > > // code ommited ... > > } > > } catch (Exception ex) { > > handleErrorRecord(record, ex); > > } > > } > > {code} > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
