Hi Hongbin,

Not sure how your CI is configured, but the NPE problem indeed occurred when I 
tested the patch using the following workflow (took me quite a long time to 
figure it out). Let's say my kylin binary is deployed under 
$DIR/kylin-mt-v0.7.2.1

1. stop the old instance. "$DIR/kylin-mt-v0.7.2.1/bin/kylin.sh stop"
2. generate a new tarball using './script/package.sh', put into $DIR
3. "replace" the binary. "cd $DIR; tar xzf kylin-mt-v0.7.2.1.tar.gz"
4. start the new instance and build the cube

The problem is 
$DIR/kylin-mt-v0.7.2.1/tomcat/webapp/kylin/WEB-INF/lib/kylin-job-mt-v0.7.2.1.jar
 is still the old one, kylin web use this jar to submit job (without new 
configuration params) while reduce task use lib/kylin-job-mt-v0.7.2.1.jar (the 
new one) to read these two params, which results in NPE.

The fix is to remove $DIR/kylin-mt-v0.7.2.1/tomcat/webapp/kylin before step 2.

Best,
Dayue

> 在 2015年8月3日,下午9:40,hongbin ma <[email protected]> 写道:
> 
> After I applied the patch on 0.7 staging (commit
> 388488be80d24ecb4d3209726c90a1b829a18087),  CI failed to due
> to FactDistinctColumnsJob fail. The error message is
> https://gist.github.com/binmahone/9a6d8bdff37d94f59dea.
> 
> I checked the job's configuration at job history page, seems the
> property  "fact.dict.column.rowkey.indexes" and "fact.dict.column.names"
> not successfully set into job's configuration.
> 
> I'm looking into the issue, any comments is welcomed.
> 
> On Mon, Aug 3, 2015 at 9:32 PM, Dayue Gao (JIRA) <[email protected]> wrote:
> 
>> 
>>    [
>> https://issues.apache.org/jira/browse/KYLIN-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651856#comment-14651856
>> ]
>> 
>> Dayue Gao commented on KYLIN-921:
>> ---------------------------------
>> 
>> Thanks Hongbin, Shaofeng! Glad to know it has been fixed in 0.7.3 :-)
>> 
>> 
>>> Dimension with all nulls cause BuildDimensionDictionary failed due to
>> FileNotFoundException
>>> 
>> -------------------------------------------------------------------------------------------
>>> 
>>>                Key: KYLIN-921
>>>                URL: https://issues.apache.org/jira/browse/KYLIN-921
>>>            Project: Kylin
>>>         Issue Type: Bug
>>>         Components: Job Engine
>>>   Affects Versions: v0.7.2
>>>           Reporter: Dayue Gao
>>>           Assignee: ZhouQianhao
>>>            Fix For: v0.7.3
>>> 
>>>        Attachments: KYLIN-921.patch
>>> 
>>> 
>>> From mailing list
>>> ----------------------
>>> {noformat}
>>> I am building a cube with some lookup table in between and getting
>>> exception at third step of cube build i.e Build Dimension Dictionary with
>>> exception saying
>>> java.io.FileNotFoundException: File does not exist:
>>> 
>> /tmp/kylin-5a2ea405-24a2-45ed-958e-2a7fddd8cc97/sc_o2s_metrics_verified123455/fact_distinct_columns/SC
>>> at
>>> 
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
>>> at
>>> 
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
>>> at
>>> 
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>> at
>>> 
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
>>> at org.apache.kylin.dict.lookup.FileTable.getSignature(FileTable.java:62)
>>> at
>>> 
>> org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164)
>>> at
>> org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
>>> at
>>> 
>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53)
>>> at
>>> 
>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>>> at
>>> 
>> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>> at
>>> 
>> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>>> at
>>> 
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
>>> at
>>> 
>> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>>> at
>>> 
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
>>> at
>>> 
>> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
>>> at
>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> {noformat}
>>> The problem is that FactDistinctColumnsMapper's map method skips null
>> values. As a result, if all values of dimension 'x' are null,
>> FactDistinctColumnsReducer will not create file for 'x', thereafter the
>> following job throws FileNotFoundException.
>>> {code:title=FactDistinctColumnsMapper.java|borderStyle=solid}
>>> public void map(KEYIN key, HCatRecord record, Context context) throws
>> IOException, InterruptedException {
>>>        try {
>>>            // code ommited ...
>>>            for (int i : factDictCols) {
>>>                outputKey.set((short) i);
>>>                fieldSchema = schema.get(flatTableIndexes[i]);
>>>                Object fieldValue = record.get(fieldSchema.getName(),
>> schema);
>>>                // NULL VALUE IS SKIPPED
>>>                if (fieldValue == null)
>>>                    continue;
>>>                // code ommited ...
>>>            }
>>>        } catch (Exception ex) {
>>>            handleErrorRecord(record, ex);
>>>        }
>>>    }
>>> {code}
>> 
>> 
>> 
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
>> 
> 
> 
> 
> -- 
> Regards,
> 
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone


Reply via email to