Dayue Gao created KYLIN-921:
-------------------------------

             Summary: Dimension with all nulls cause BuildDimensionDictionary 
failed due to FileNotFoundException
                 Key: KYLIN-921
                 URL: https://issues.apache.org/jira/browse/KYLIN-921
             Project: Kylin
          Issue Type: Bug
          Components: Job Engine
    Affects Versions: v0.7.2
            Reporter: Dayue Gao
            Assignee: ZhouQianhao


>From mailing list
----------------------
{noformat}
I am building a cube with some lookup table in between and getting
exception at third step of cube build i.e Build Dimension Dictionary with
exception saying

java.io.FileNotFoundException: File does not exist:
/tmp/kylin-5a2ea405-24a2-45ed-958e-2a7fddd8cc97/sc_o2s_metrics_verified123455/fact_distinct_columns/SC
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.kylin.dict.lookup.FileTable.getSignature(FileTable.java:62)
at
org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164)
at org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
at
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53)
at
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
at
org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

The problem is that FactDistinctColumnsMapper's map method skips null values. 
As a result, if all values of dimension 'x' are null, 
FactDistinctColumnsReducer will not create file for 'x', thereafter the 
following job throws FileNotFoundException.

{code:title=FactDistinctColumnsMapper.java|borderStyle=solid}
public void map(KEYIN key, HCatRecord record, Context context) throws 
IOException, InterruptedException {
        try {
            // code ommited ...
            for (int i : factDictCols) {
                outputKey.set((short) i);
                fieldSchema = schema.get(flatTableIndexes[i]);
                Object fieldValue = record.get(fieldSchema.getName(), schema);
                // NULL VALUE IS SKIPPED
                if (fieldValue == null)
                    continue;
                // code ommited ...
            }
        } catch (Exception ex) {
            handleErrorRecord(record, ex);
        }
    }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to