Hi Hongbin, Not sure how your CI is configured, but the NPE problem indeed occurred when I tested the patch using the following workflow (took me quite a long time to figure it out). Let's say my kylin binary is deployed under $DIR/kylin-mt-v0.7.2.1
1. stop the old instance. "$DIR/kylin-mt-v0.7.2.1/bin/kylin.sh stop" 2. generate a new tarball using './script/package.sh', put into $DIR 3. "replace" the binary. "cd $DIR; tar xzf kylin-mt-v0.7.2.1.tar.gz" 4. start the new instance and build the cube The problem is $DIR/kylin-mt-v0.7.2.1/tomcat/webapp/kylin/WEB-INF/lib/kylin-job-mt-v0.7.2.1.jar is still the old one, kylin web use this jar to submit job (without new configuration params) while reduce task use lib/kylin-job-mt-v0.7.2.1.jar (the new one) to read these two params, which results in NPE. The fix is to remove $DIR/kylin-mt-v0.7.2.1/tomcat/webapp/kylin before step 2. Best, Dayue > 在 2015年8月3日,下午9:40,hongbin ma <[email protected]> 写道: > > After I applied the patch on 0.7 staging (commit > 388488be80d24ecb4d3209726c90a1b829a18087), CI failed to due > to FactDistinctColumnsJob fail. The error message is > https://gist.github.com/binmahone/9a6d8bdff37d94f59dea. > > I checked the job's configuration at job history page, seems the > property "fact.dict.column.rowkey.indexes" and "fact.dict.column.names" > not successfully set into job's configuration. > > I'm looking into the issue, any comments is welcomed. > > On Mon, Aug 3, 2015 at 9:32 PM, Dayue Gao (JIRA) <[email protected]> wrote: > >> >> [ >> https://issues.apache.org/jira/browse/KYLIN-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651856#comment-14651856 >> ] >> >> Dayue Gao commented on KYLIN-921: >> --------------------------------- >> >> Thanks Hongbin, Shaofeng! Glad to know it has been fixed in 0.7.3 :-) >> >> >>> Dimension with all nulls cause BuildDimensionDictionary failed due to >> FileNotFoundException >>> >> ------------------------------------------------------------------------------------------- >>> >>> Key: KYLIN-921 >>> URL: https://issues.apache.org/jira/browse/KYLIN-921 >>> Project: Kylin >>> Issue Type: Bug >>> Components: Job Engine >>> Affects Versions: v0.7.2 >>> Reporter: Dayue Gao >>> Assignee: ZhouQianhao >>> Fix For: v0.7.3 >>> >>> Attachments: KYLIN-921.patch >>> >>> >>> From mailing list >>> ---------------------- >>> {noformat} >>> I am building a cube with some lookup table in between and getting >>> exception at third step of cube build i.e Build Dimension Dictionary with >>> exception saying >>> java.io.FileNotFoundException: File does not exist: >>> >> /tmp/kylin-5a2ea405-24a2-45ed-958e-2a7fddd8cc97/sc_o2s_metrics_verified123455/fact_distinct_columns/SC >>> at >>> >> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093) >>> at >>> >> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) >>> at >>> >> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >>> at >>> >> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) >>> at org.apache.kylin.dict.lookup.FileTable.getSignature(FileTable.java:62) >>> at >>> >> org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164) >>> at >> org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154) >>> at >>> >> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53) >>> at >>> >> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) >>> at >>> >> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) >>> at >>> >> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63) >>> at >>> >> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) >>> at >>> >> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) >>> at >>> >> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) >>> at >>> >> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132) >>> at >>> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> {noformat} >>> The problem is that FactDistinctColumnsMapper's map method skips null >> values. As a result, if all values of dimension 'x' are null, >> FactDistinctColumnsReducer will not create file for 'x', thereafter the >> following job throws FileNotFoundException. >>> {code:title=FactDistinctColumnsMapper.java|borderStyle=solid} >>> public void map(KEYIN key, HCatRecord record, Context context) throws >> IOException, InterruptedException { >>> try { >>> // code ommited ... >>> for (int i : factDictCols) { >>> outputKey.set((short) i); >>> fieldSchema = schema.get(flatTableIndexes[i]); >>> Object fieldValue = record.get(fieldSchema.getName(), >> schema); >>> // NULL VALUE IS SKIPPED >>> if (fieldValue == null) >>> continue; >>> // code ommited ... >>> } >>> } catch (Exception ex) { >>> handleErrorRecord(record, ex); >>> } >>> } >>> {code} >> >> >> >> -- >> This message was sent by Atlassian JIRA >> (v6.3.4#6332) >> > > > > -- > Regards, > > *Bin Mahone | 马洪宾* > Apache Kylin: http://kylin.io > Github: https://github.com/binmahone
