The 2 million threshold is not about record number.

The cap 2 million in DictionaryGenerator means cardinality of a column.
That is how many distinct values you have on a column. And if you really
have a column with more than 2 million distinct values, you can choose to
disable dictionary on that column. And Kylin will still work out the cube
correctly.

On Tue, May 19, 2015 at 7:49 PM, Luke Han <[email protected]> wrote:

> Hi Tim,
>     Please make sure your Hive/Hadoop/HBase works very well from shell
> first. one you loaded data into Hive, please run some hive query from
> shell. My first guest is your user account "root" do not has permission to
> execute some HDFS/Hive command. Which Hadoop distribution you are using,
> CDH or HDP?
>     Please pick up one user account with fully permission to
> HDFS/Hive/HBase, then you should be fine.
>
>     Back to your #3 question, big data is not mean query "big" result in
> one time;-) Kylin is design for OLAP, not for OLTP, not for ETL not for
> ML...Most of analytics cases will only fetch small result set from big
> source data with filtering and other conditions to bring the reasonable
> size for "people" to get insight. 2m rows is already "huge" for one query.
>
>     Hope these answers could help you a little bit.
>     Please feel free to leave your questions here again:)
>
>     Thanks.
>
> Luke
>
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> 2015-05-18 15:10 GMT+08:00 梁嘉明 <[email protected]>:
>
> > Hello, I'm a student from SCUT.I'm interested in big data.Recently I want
> > to test the big data query ability of kylin,and met some problem that
> > bother me for a month.I'm new here and eager to get your help, thank you!
> >
> >
> > 1.I load tables to hive by ETLtool kettle sucessfully,but when I try to
> > load them to kylin ,it told me success but the table size is null.So I
> try
> > to add my my table sql to kylin's create_sample_tables.sql, It create
> > sucessfully this time but it obviously not a good way to create tables .
> > what's the difference between these two method. And how can I load tables
> > to kylin from hive in normal way.
> >
> >
> > 2. When I build the sample cube ,It met error on the penultimate step
> ,the
> > error information is:  org.apache.hadoop.security.AccessControlException:
> > Permission denied. user=root is not the owner of inode=null.  It seem
> like
> > a permission problem, and I can't find the way out.
> >
> >
> > 3.I want to test the big data query ability,about 1 billion,but kylin set
> > its threshold value as 2 million. I found the threshold setting in source
> > code DictionaryGenerator.java . Are there any other threshold setting?
> And
> > what's next after I change the threshold value.
> >
> >                                                            yours,
> sincerely.
> >
> >
> >  tim.ljm
>

Reply via email to