Hi hongbin, Thanks for pointing out!
Have you ported it to 0.7-staging, anything I can help? Best, Dayue > 在 2015年8月4日,下午12:18,hongbin ma <[email protected]> 写道: > > hi, dayue, > > what you described is in short re-generate the binary package and deploy > the new one, this is not our typical development style. For developers we > suggest following > http://kylin.incubator.apache.org/docs/development/dev_env.html to setup a > dev env and run tests after code modification. > > The issue I reported yesterday is a bug raised when there is no fact column > uses dictionary.(test cube test_kylin_cube_with_slr_empty as an example, > all of the fact table columns are FK, which will in term uses dictionary on > lookup table's PK, thus bypassing fact table column values collecting.) > > when factDictColRowKeyIndexes is empty, the follow code will somehow fail > to set property into job's conf.(Maybe MR automatically skips empty > strings) > > > *jobConf.set(BatchConstants.CFG_FACT_DICT_COLUMN_ROWKEY_INDEXES, > joiner.join(factDictColRowKeyIndexes)); * > > so we just need to double check the property at mapper/reducer side > as 07db22536657dfb278dfd29f616789560b208948 did. > > > On Tue, Aug 4, 2015 at 8:33 AM, Dayue Gao <[email protected]> wrote: > >> Hi Hongbin, >> >> Not sure how your CI is configured, but the NPE problem indeed occurred >> when I tested the patch using the following workflow (took me quite a long >> time to figure it out). Let's say my kylin binary is deployed under >> $DIR/kylin-mt-v0.7.2.1 >> >> 1. stop the old instance. "$DIR/kylin-mt-v0.7.2.1/bin/kylin.sh stop" >> 2. generate a new tarball using './script/package.sh', put into $DIR >> 3. "replace" the binary. "cd $DIR; tar xzf kylin-mt-v0.7.2.1.tar.gz" >> 4. start the new instance and build the cube >> >> The problem is >> $DIR/kylin-mt-v0.7.2.1/tomcat/webapp/kylin/WEB-INF/lib/kylin-job-mt-v0.7.2.1.jar >> is still the old one, kylin web use this jar to submit job (without new >> configuration params) while reduce task use lib/kylin-job-mt-v0.7.2.1.jar >> (the new one) to read these two params, which results in NPE. >> >> The fix is to remove $DIR/kylin-mt-v0.7.2.1/tomcat/webapp/kylin before >> step 2. >> >> Best, >> Dayue >> >>> 在 2015年8月3日,下午9:40,hongbin ma <[email protected]> 写道: >>> >>> After I applied the patch on 0.7 staging (commit >>> 388488be80d24ecb4d3209726c90a1b829a18087), CI failed to due >>> to FactDistinctColumnsJob fail. The error message is >>> https://gist.github.com/binmahone/9a6d8bdff37d94f59dea. >>> >>> I checked the job's configuration at job history page, seems the >>> property "fact.dict.column.rowkey.indexes" and "fact.dict.column.names" >>> not successfully set into job's configuration. >>> >>> I'm looking into the issue, any comments is welcomed. >>> >>> On Mon, Aug 3, 2015 at 9:32 PM, Dayue Gao (JIRA) <[email protected]> >> wrote: >>> >>>> >>>> [ >>>> >> https://issues.apache.org/jira/browse/KYLIN-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651856#comment-14651856 >>>> ] >>>> >>>> Dayue Gao commented on KYLIN-921: >>>> --------------------------------- >>>> >>>> Thanks Hongbin, Shaofeng! Glad to know it has been fixed in 0.7.3 :-) >>>> >>>> >>>>> Dimension with all nulls cause BuildDimensionDictionary failed due to >>>> FileNotFoundException >>>>> >>>> >> ------------------------------------------------------------------------------------------- >>>>> >>>>> Key: KYLIN-921 >>>>> URL: https://issues.apache.org/jira/browse/KYLIN-921 >>>>> Project: Kylin >>>>> Issue Type: Bug >>>>> Components: Job Engine >>>>> Affects Versions: v0.7.2 >>>>> Reporter: Dayue Gao >>>>> Assignee: ZhouQianhao >>>>> Fix For: v0.7.3 >>>>> >>>>> Attachments: KYLIN-921.patch >>>>> >>>>> >>>>> From mailing list >>>>> ---------------------- >>>>> {noformat} >>>>> I am building a cube with some lookup table in between and getting >>>>> exception at third step of cube build i.e Build Dimension Dictionary >> with >>>>> exception saying >>>>> java.io.FileNotFoundException: File does not exist: >>>>> >>>> >> /tmp/kylin-5a2ea405-24a2-45ed-958e-2a7fddd8cc97/sc_o2s_metrics_verified123455/fact_distinct_columns/SC >>>>> at >>>>> >>>> >> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093) >>>>> at >>>>> >>>> >> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) >>>>> at >>>>> >>>> >> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >>>>> at >>>>> >>>> >> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) >>>>> at >> org.apache.kylin.dict.lookup.FileTable.getSignature(FileTable.java:62) >>>>> at >>>>> >>>> >> org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164) >>>>> at >>>> org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154) >>>>> at >>>>> >>>> >> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53) >>>>> at >>>>> >>>> >> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) >>>>> at >>>>> >>>> >> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53) >>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) >>>>> at >>>>> >>>> >> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63) >>>>> at >>>>> >>>> >> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) >>>>> at >>>>> >>>> >> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) >>>>> at >>>>> >>>> >> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107) >>>>> at >>>>> >>>> >> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132) >>>>> at >>>>> >>>> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> >>>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> {noformat} >>>>> The problem is that FactDistinctColumnsMapper's map method skips null >>>> values. As a result, if all values of dimension 'x' are null, >>>> FactDistinctColumnsReducer will not create file for 'x', thereafter the >>>> following job throws FileNotFoundException. >>>>> {code:title=FactDistinctColumnsMapper.java|borderStyle=solid} >>>>> public void map(KEYIN key, HCatRecord record, Context context) throws >>>> IOException, InterruptedException { >>>>> try { >>>>> // code ommited ... >>>>> for (int i : factDictCols) { >>>>> outputKey.set((short) i); >>>>> fieldSchema = schema.get(flatTableIndexes[i]); >>>>> Object fieldValue = record.get(fieldSchema.getName(), >>>> schema); >>>>> // NULL VALUE IS SKIPPED >>>>> if (fieldValue == null) >>>>> continue; >>>>> // code ommited ... >>>>> } >>>>> } catch (Exception ex) { >>>>> handleErrorRecord(record, ex); >>>>> } >>>>> } >>>>> {code} >>>> >>>> >>>> >>>> -- >>>> This message was sent by Atlassian JIRA >>>> (v6.3.4#6332) >>>> >>> >>> >>> >>> -- >>> Regards, >>> >>> *Bin Mahone | 马洪宾* >>> Apache Kylin: http://kylin.io >>> Github: https://github.com/binmahone >> >> >> > > > -- > Regards, > > *Bin Mahone | 马洪宾* > Apache Kylin: http://kylin.io > Github: https://github.com/binmahone
