KYLIN-921 is irrelevant to empty segments. It's to deal with cases where
some dimension are always null (while some other dimension being solid
values). Will https://issues.apache.org/jira/browse/KYLIN-863 handle the
case?

On Mon, Aug 3, 2015 at 3:04 PM, Shaofeng SHI (JIRA) <[email protected]> wrote:

>
>     [
> https://issues.apache.org/jira/browse/KYLIN-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651526#comment-14651526
> ]
>
> Shaofeng SHI commented on KYLIN-921:
> ------------------------------------
>
> Hi Dayue and hongbin, in 0.7.3 I already made the change to allow empty
> cube segment, that means even if a dimension is null, the job will not fail
> in the third step; the change was already included in KYLIN-863 and made in
> both 0.7 and 0.8;
>
> > Dimension with all nulls cause BuildDimensionDictionary failed due to
> FileNotFoundException
> >
> -------------------------------------------------------------------------------------------
> >
> >                 Key: KYLIN-921
> >                 URL: https://issues.apache.org/jira/browse/KYLIN-921
> >             Project: Kylin
> >          Issue Type: Bug
> >          Components: Job Engine
> >    Affects Versions: v0.7.2
> >            Reporter: Dayue Gao
> >            Assignee: ZhouQianhao
> >             Fix For: v0.7.3
> >
> >         Attachments: KYLIN-921.patch
> >
> >
> > From mailing list
> > ----------------------
> > {noformat}
> > I am building a cube with some lookup table in between and getting
> > exception at third step of cube build i.e Build Dimension Dictionary with
> > exception saying
> > java.io.FileNotFoundException: File does not exist:
> >
> /tmp/kylin-5a2ea405-24a2-45ed-958e-2a7fddd8cc97/sc_o2s_metrics_verified123455/fact_distinct_columns/SC
> > at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
> > at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
> > at
> >
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> > at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
> > at org.apache.kylin.dict.lookup.FileTable.getSignature(FileTable.java:62)
> > at
> >
> org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164)
> > at
> org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
> > at
> >
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53)
> > at
> >
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
> > at
> >
> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > at
> >
> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> > at
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > at
> >
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> > at
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> > at
> >
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> > {noformat}
> > The problem is that FactDistinctColumnsMapper's map method skips null
> values. As a result, if all values of dimension 'x' are null,
> FactDistinctColumnsReducer will not create file for 'x', thereafter the
> following job throws FileNotFoundException.
> > {code:title=FactDistinctColumnsMapper.java|borderStyle=solid}
> > public void map(KEYIN key, HCatRecord record, Context context) throws
> IOException, InterruptedException {
> >         try {
> >             // code ommited ...
> >             for (int i : factDictCols) {
> >                 outputKey.set((short) i);
> >                 fieldSchema = schema.get(flatTableIndexes[i]);
> >                 Object fieldValue = record.get(fieldSchema.getName(),
> schema);
> >                 // NULL VALUE IS SKIPPED
> >                 if (fieldValue == null)
> >                     continue;
> >                 // code ommited ...
> >             }
> >         } catch (Exception ex) {
> >             handleErrorRecord(record, ex);
> >         }
> >     }
> > {code}
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Reply via email to