Re: [jira] [Commented] (KYLIN-921) Dimension with all nulls cause BuildDimensionDictionary failed due to FileNotFoundException

hongbin ma Mon, 03 Aug 2015 21:18:54 -0700

hi, dayue,

what you described is in short re-generate the binary package and deploy
the new one, this is not our typical development style. For developers we
suggest following
http://kylin.incubator.apache.org/docs/development/dev_env.html to setup a
dev env and run tests after code modification.


The issue I reported yesterday is a bug raised when there is no fact column
uses dictionary.(test cube test_kylin_cube_with_slr_empty as an example,
all of the fact table columns are FK, which will in term uses dictionary on
lookup table's PK, thus bypassing fact table column values collecting.)

when factDictColRowKeyIndexes is empty, the follow code will somehow fail
to set  property into job's conf.(Maybe MR automatically skips empty
strings)


*jobConf.set(BatchConstants.CFG_FACT_DICT_COLUMN_ROWKEY_INDEXES,
joiner.join(factDictColRowKeyIndexes)); *

so we just need to double check the property at mapper/reducer side
as 07db22536657dfb278dfd29f616789560b208948 did.


On Tue, Aug 4, 2015 at 8:33 AM, Dayue Gao <[email protected]> wrote:

> Hi Hongbin,
>
> Not sure how your CI is configured, but the NPE problem indeed occurred
> when I tested the patch using the following workflow (took me quite a long
> time to figure it out). Let's say my kylin binary is deployed under
> $DIR/kylin-mt-v0.7.2.1
>
> 1. stop the old instance. "$DIR/kylin-mt-v0.7.2.1/bin/kylin.sh stop"
> 2. generate a new tarball using './script/package.sh', put into $DIR
> 3. "replace" the binary. "cd $DIR; tar xzf kylin-mt-v0.7.2.1.tar.gz"
> 4. start the new instance and build the cube
>
> The problem is
> $DIR/kylin-mt-v0.7.2.1/tomcat/webapp/kylin/WEB-INF/lib/kylin-job-mt-v0.7.2.1.jar
> is still the old one, kylin web use this jar to submit job (without new
> configuration params) while reduce task use lib/kylin-job-mt-v0.7.2.1.jar
> (the new one) to read these two params, which results in NPE.
>
> The fix is to remove $DIR/kylin-mt-v0.7.2.1/tomcat/webapp/kylin before
> step 2.
>
> Best,
> Dayue
>
> > 在 2015年8月3日，下午9:40，hongbin ma <[email protected]> 写道：
> >
> > After I applied the patch on 0.7 staging (commit
> > 388488be80d24ecb4d3209726c90a1b829a18087),  CI failed to due
> > to FactDistinctColumnsJob fail. The error message is
> > https://gist.github.com/binmahone/9a6d8bdff37d94f59dea.
> >
> > I checked the job's configuration at job history page, seems the
> > property  "fact.dict.column.rowkey.indexes" and "fact.dict.column.names"
> > not successfully set into job's configuration.
> >
> > I'm looking into the issue, any comments is welcomed.
> >
> > On Mon, Aug 3, 2015 at 9:32 PM, Dayue Gao (JIRA) <[email protected]>
> wrote:
> >
> >>
> >>    [
> >>
> https://issues.apache.org/jira/browse/KYLIN-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651856#comment-14651856
> >> ]
> >>
> >> Dayue Gao commented on KYLIN-921:
> >> ---------------------------------
> >>
> >> Thanks Hongbin, Shaofeng! Glad to know it has been fixed in 0.7.3 :-)
> >>
> >>
> >>> Dimension with all nulls cause BuildDimensionDictionary failed due to
> >> FileNotFoundException
> >>>
> >>
> -------------------------------------------------------------------------------------------
> >>>
> >>>                Key: KYLIN-921
> >>>                URL: https://issues.apache.org/jira/browse/KYLIN-921
> >>>            Project: Kylin
> >>>         Issue Type: Bug
> >>>         Components: Job Engine
> >>>   Affects Versions: v0.7.2
> >>>           Reporter: Dayue Gao
> >>>           Assignee: ZhouQianhao
> >>>            Fix For: v0.7.3
> >>>
> >>>        Attachments: KYLIN-921.patch
> >>>
> >>>
> >>> From mailing list
> >>> ----------------------
> >>> {noformat}
> >>> I am building a cube with some lookup table in between and getting
> >>> exception at third step of cube build i.e Build Dimension Dictionary
> with
> >>> exception saying
> >>> java.io.FileNotFoundException: File does not exist:
> >>>
> >>
> /tmp/kylin-5a2ea405-24a2-45ed-958e-2a7fddd8cc97/sc_o2s_metrics_verified123455/fact_distinct_columns/SC
> >>> at
> >>>
> >>
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
> >>> at
> >>>
> >>
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
> >>> at
> >>>
> >>
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> >>> at
> >>>
> >>
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
> >>> at
> org.apache.kylin.dict.lookup.FileTable.getSignature(FileTable.java:62)
> >>> at
> >>>
> >>
> org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164)
> >>> at
> >> org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
> >>> at
> >>>
> >>
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53)
> >>> at
> >>>
> >>
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
> >>> at
> >>>
> >>
> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> >>> at
> >>>
> >>
> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> >>> at
> >>>
> >>
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> >>> at
> >>>
> >>
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> >>> at
> >>>
> >>
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> >>> at
> >>>
> >>
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
> >>> at
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>> at
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>> at java.lang.Thread.run(Thread.java:745)
> >>> {noformat}
> >>> The problem is that FactDistinctColumnsMapper's map method skips null
> >> values. As a result, if all values of dimension 'x' are null,
> >> FactDistinctColumnsReducer will not create file for 'x', thereafter the
> >> following job throws FileNotFoundException.
> >>> {code:title=FactDistinctColumnsMapper.java|borderStyle=solid}
> >>> public void map(KEYIN key, HCatRecord record, Context context) throws
> >> IOException, InterruptedException {
> >>>        try {
> >>>            // code ommited ...
> >>>            for (int i : factDictCols) {
> >>>                outputKey.set((short) i);
> >>>                fieldSchema = schema.get(flatTableIndexes[i]);
> >>>                Object fieldValue = record.get(fieldSchema.getName(),
> >> schema);
> >>>                // NULL VALUE IS SKIPPED
> >>>                if (fieldValue == null)
> >>>                    continue;
> >>>                // code ommited ...
> >>>            }
> >>>        } catch (Exception ex) {
> >>>            handleErrorRecord(record, ex);
> >>>        }
> >>>    }
> >>> {code}
> >>
> >>
> >>
> >> --
> >> This message was sent by Atlassian JIRA
> >> (v6.3.4#6332)
> >>
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
>
>
>


-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: [jira] [Commented] (KYLIN-921) Dimension with all nulls cause BuildDimensionDictionary failed due to FileNotFoundException

Reply via email to