shaofeng will you please check this question again? On Mon, Aug 3, 2015 at 3:08 PM, hongbin ma <[email protected]> wrote:
> KYLIN-921 is irrelevant to empty segments. It's to deal with cases where > some dimension are always null (while some other dimension being solid > values). Will https://issues.apache.org/jira/browse/KYLIN-863 handle the > case? > > On Mon, Aug 3, 2015 at 3:04 PM, Shaofeng SHI (JIRA) <[email protected]> > wrote: > >> >> [ >> https://issues.apache.org/jira/browse/KYLIN-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651526#comment-14651526 >> ] >> >> Shaofeng SHI commented on KYLIN-921: >> ------------------------------------ >> >> Hi Dayue and hongbin, in 0.7.3 I already made the change to allow empty >> cube segment, that means even if a dimension is null, the job will not fail >> in the third step; the change was already included in KYLIN-863 and made in >> both 0.7 and 0.8; >> >> > Dimension with all nulls cause BuildDimensionDictionary failed due to >> FileNotFoundException >> > >> ------------------------------------------------------------------------------------------- >> > >> > Key: KYLIN-921 >> > URL: https://issues.apache.org/jira/browse/KYLIN-921 >> > Project: Kylin >> > Issue Type: Bug >> > Components: Job Engine >> > Affects Versions: v0.7.2 >> > Reporter: Dayue Gao >> > Assignee: ZhouQianhao >> > Fix For: v0.7.3 >> > >> > Attachments: KYLIN-921.patch >> > >> > >> > From mailing list >> > ---------------------- >> > {noformat} >> > I am building a cube with some lookup table in between and getting >> > exception at third step of cube build i.e Build Dimension Dictionary >> with >> > exception saying >> > java.io.FileNotFoundException: File does not exist: >> > >> /tmp/kylin-5a2ea405-24a2-45ed-958e-2a7fddd8cc97/sc_o2s_metrics_verified123455/fact_distinct_columns/SC >> > at >> > >> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093) >> > at >> > >> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) >> > at >> > >> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >> > at >> > >> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) >> > at >> org.apache.kylin.dict.lookup.FileTable.getSignature(FileTable.java:62) >> > at >> > >> org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164) >> > at >> org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154) >> > at >> > >> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53) >> > at >> > >> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) >> > at >> > >> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53) >> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) >> > at >> > >> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63) >> > at >> > >> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) >> > at >> > >> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) >> > at >> > >> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) >> > at >> > >> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132) >> > at >> > >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> > at >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> > at java.lang.Thread.run(Thread.java:745) >> > {noformat} >> > The problem is that FactDistinctColumnsMapper's map method skips null >> values. As a result, if all values of dimension 'x' are null, >> FactDistinctColumnsReducer will not create file for 'x', thereafter the >> following job throws FileNotFoundException. >> > {code:title=FactDistinctColumnsMapper.java|borderStyle=solid} >> > public void map(KEYIN key, HCatRecord record, Context context) throws >> IOException, InterruptedException { >> > try { >> > // code ommited ... >> > for (int i : factDictCols) { >> > outputKey.set((short) i); >> > fieldSchema = schema.get(flatTableIndexes[i]); >> > Object fieldValue = record.get(fieldSchema.getName(), >> schema); >> > // NULL VALUE IS SKIPPED >> > if (fieldValue == null) >> > continue; >> > // code ommited ... >> > } >> > } catch (Exception ex) { >> > handleErrorRecord(record, ex); >> > } >> > } >> > {code} >> >> >> >> -- >> This message was sent by Atlassian JIRA >> (v6.3.4#6332) >> > > > > -- > Regards, > > *Bin Mahone | 马洪宾* > Apache Kylin: http://kylin.io > Github: https://github.com/binmahone > -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
