Hi hongbin,

Thanks for pointing out!

Have you ported it to 0.7-staging, anything I can help?

Best,
Dayue

> 在 2015年8月4日,下午12:18,hongbin ma <[email protected]> 写道:
> 
> hi, dayue,
> 
> what you described is in short re-generate the binary package and deploy
> the new one, this is not our typical development style. For developers we
> suggest following
> http://kylin.incubator.apache.org/docs/development/dev_env.html to setup a
> dev env and run tests after code modification.
> 
> The issue I reported yesterday is a bug raised when there is no fact column
> uses dictionary.(test cube test_kylin_cube_with_slr_empty as an example,
> all of the fact table columns are FK, which will in term uses dictionary on
> lookup table's PK, thus bypassing fact table column values collecting.)
> 
> when factDictColRowKeyIndexes is empty, the follow code will somehow fail
> to set  property into job's conf.(Maybe MR automatically skips empty
> strings)
> 
> 
> *jobConf.set(BatchConstants.CFG_FACT_DICT_COLUMN_ROWKEY_INDEXES,
> joiner.join(factDictColRowKeyIndexes)); *
> 
> so we just need to double check the property at mapper/reducer side
> as 07db22536657dfb278dfd29f616789560b208948 did.
> 
> 
> On Tue, Aug 4, 2015 at 8:33 AM, Dayue Gao <[email protected]> wrote:
> 
>> Hi Hongbin,
>> 
>> Not sure how your CI is configured, but the NPE problem indeed occurred
>> when I tested the patch using the following workflow (took me quite a long
>> time to figure it out). Let's say my kylin binary is deployed under
>> $DIR/kylin-mt-v0.7.2.1
>> 
>> 1. stop the old instance. "$DIR/kylin-mt-v0.7.2.1/bin/kylin.sh stop"
>> 2. generate a new tarball using './script/package.sh', put into $DIR
>> 3. "replace" the binary. "cd $DIR; tar xzf kylin-mt-v0.7.2.1.tar.gz"
>> 4. start the new instance and build the cube
>> 
>> The problem is
>> $DIR/kylin-mt-v0.7.2.1/tomcat/webapp/kylin/WEB-INF/lib/kylin-job-mt-v0.7.2.1.jar
>> is still the old one, kylin web use this jar to submit job (without new
>> configuration params) while reduce task use lib/kylin-job-mt-v0.7.2.1.jar
>> (the new one) to read these two params, which results in NPE.
>> 
>> The fix is to remove $DIR/kylin-mt-v0.7.2.1/tomcat/webapp/kylin before
>> step 2.
>> 
>> Best,
>> Dayue
>> 
>>> 在 2015年8月3日,下午9:40,hongbin ma <[email protected]> 写道:
>>> 
>>> After I applied the patch on 0.7 staging (commit
>>> 388488be80d24ecb4d3209726c90a1b829a18087),  CI failed to due
>>> to FactDistinctColumnsJob fail. The error message is
>>> https://gist.github.com/binmahone/9a6d8bdff37d94f59dea.
>>> 
>>> I checked the job's configuration at job history page, seems the
>>> property  "fact.dict.column.rowkey.indexes" and "fact.dict.column.names"
>>> not successfully set into job's configuration.
>>> 
>>> I'm looking into the issue, any comments is welcomed.
>>> 
>>> On Mon, Aug 3, 2015 at 9:32 PM, Dayue Gao (JIRA) <[email protected]>
>> wrote:
>>> 
>>>> 
>>>>   [
>>>> 
>> https://issues.apache.org/jira/browse/KYLIN-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651856#comment-14651856
>>>> ]
>>>> 
>>>> Dayue Gao commented on KYLIN-921:
>>>> ---------------------------------
>>>> 
>>>> Thanks Hongbin, Shaofeng! Glad to know it has been fixed in 0.7.3 :-)
>>>> 
>>>> 
>>>>> Dimension with all nulls cause BuildDimensionDictionary failed due to
>>>> FileNotFoundException
>>>>> 
>>>> 
>> -------------------------------------------------------------------------------------------
>>>>> 
>>>>>               Key: KYLIN-921
>>>>>               URL: https://issues.apache.org/jira/browse/KYLIN-921
>>>>>           Project: Kylin
>>>>>        Issue Type: Bug
>>>>>        Components: Job Engine
>>>>>  Affects Versions: v0.7.2
>>>>>          Reporter: Dayue Gao
>>>>>          Assignee: ZhouQianhao
>>>>>           Fix For: v0.7.3
>>>>> 
>>>>>       Attachments: KYLIN-921.patch
>>>>> 
>>>>> 
>>>>> From mailing list
>>>>> ----------------------
>>>>> {noformat}
>>>>> I am building a cube with some lookup table in between and getting
>>>>> exception at third step of cube build i.e Build Dimension Dictionary
>> with
>>>>> exception saying
>>>>> java.io.FileNotFoundException: File does not exist:
>>>>> 
>>>> 
>> /tmp/kylin-5a2ea405-24a2-45ed-958e-2a7fddd8cc97/sc_o2s_metrics_verified123455/fact_distinct_columns/SC
>>>>> at
>>>>> 
>>>> 
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
>>>>> at
>>>>> 
>>>> 
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
>>>>> at
>>>>> 
>>>> 
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>> at
>>>>> 
>>>> 
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
>>>>> at
>> org.apache.kylin.dict.lookup.FileTable.getSignature(FileTable.java:62)
>>>>> at
>>>>> 
>>>> 
>> org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164)
>>>>> at
>>>> org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
>>>>> at
>>>>> 
>>>> 
>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53)
>>>>> at
>>>>> 
>>>> 
>> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>>>>> at
>>>>> 
>>>> 
>> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>>>> at
>>>>> 
>>>> 
>> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>>>>> at
>>>>> 
>>>> 
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
>>>>> at
>>>>> 
>>>> 
>> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>>>>> at
>>>>> 
>>>> 
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
>>>>> at
>>>>> 
>>>> 
>> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
>>>>> at
>>>>> 
>>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> at
>>>>> 
>>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> {noformat}
>>>>> The problem is that FactDistinctColumnsMapper's map method skips null
>>>> values. As a result, if all values of dimension 'x' are null,
>>>> FactDistinctColumnsReducer will not create file for 'x', thereafter the
>>>> following job throws FileNotFoundException.
>>>>> {code:title=FactDistinctColumnsMapper.java|borderStyle=solid}
>>>>> public void map(KEYIN key, HCatRecord record, Context context) throws
>>>> IOException, InterruptedException {
>>>>>       try {
>>>>>           // code ommited ...
>>>>>           for (int i : factDictCols) {
>>>>>               outputKey.set((short) i);
>>>>>               fieldSchema = schema.get(flatTableIndexes[i]);
>>>>>               Object fieldValue = record.get(fieldSchema.getName(),
>>>> schema);
>>>>>               // NULL VALUE IS SKIPPED
>>>>>               if (fieldValue == null)
>>>>>                   continue;
>>>>>               // code ommited ...
>>>>>           }
>>>>>       } catch (Exception ex) {
>>>>>           handleErrorRecord(record, ex);
>>>>>       }
>>>>>   }
>>>>> {code}
>>>> 
>>>> 
>>>> 
>>>> --
>>>> This message was sent by Atlassian JIRA
>>>> (v6.3.4#6332)
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> 
>>> *Bin Mahone | 马洪宾*
>>> Apache Kylin: http://kylin.io
>>> Github: https://github.com/binmahone
>> 
>> 
>> 
> 
> 
> -- 
> Regards,
> 
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone


Reply via email to