You should first figure out the dimension really suits dimension encoding.
If a dimension has more than millions of cardinality, its built
dictionary's size will lose control. The bad news is that Kylin caches the
dictionary in query server's heap as well as in some of the MR mappers
head. It could cause potential performance issues.

Do you have sample data for this dimension? maybe you should think about
fixed_length encoding or integer encoding, rather than using dict encoding
for this specific dimension.

On Thu, Dec 8, 2016 at 10:20 PM, Alberto Ramón <[email protected]>
wrote:

> Humm, you can try this:
>
> With Kylin 1705 <https://issues.apache.org/jira/browse/KYLIN-1705> you can
> use Global dictionary Builder, which support 2 Billons of values (versus
> previous dic 5 Millons)
>
> In Teorical you can migrate from old dics (Kylin 1775
> <https://issues.apache.org/jira/browse/KYLIN-1775> )
>
> 2016-12-08 7:57 GMT+01:00 [email protected] <[email protected]>:
>
> >     I improved the version from 1.5.4.1 to 1.6.0 and modified KYLIN_HOME,
> > and modied "kylin.dictionary.max.cardinality=5000000" to
> >  "kylin.dictionary.max.cardinality=30000000" in file kylin.properties,
> > then start kylin 1.6-->create model-->create cube-->build cube
> >    I got the following error message:
> >
> > java.lang.RuntimeException: Failed to create dictionary on
> > DEFAULT.TEST_500W_TBL.ROWKEY
> > at org.apache.kylin.dict.DictionaryManager.buildDictionary(
> > DictionaryManager.java:325)
> > at org.apache.kylin.cube.CubeManager.buildDictionary(
> CubeManager.java:222)
> > at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(
> > DictionaryGeneratorCLI.java:50)
> > at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(
> > DictionaryGeneratorCLI.java:41)
> > at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(
> > CreateDictionaryJob.java:54)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > at org.apache.kylin.engine.mr.common.HadoopShellExecutable.
> > doWork(HadoopShellExecutable.java:63)
> > at org.apache.kylin.job.execution.AbstractExecutable.
> > execute(AbstractExecutable.java:113)
> > at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
> > DefaultChainedExecutable.java:57)
> > at org.apache.kylin.job.execution.AbstractExecutable.
> > execute(AbstractExecutable.java:113)
> > at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(
> > DefaultScheduler.java:136)
> > at java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1142)
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.lang.IllegalArgumentException: Too high cardinality is
> > not suitable for dictionary -- cardinality: 5359970
> > at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
> > DictionaryGenerator.java:96)
> >
> >
> >
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*

Reply via email to