The 2 million threshold is not about record number. The cap 2 million in DictionaryGenerator means cardinality of a column. That is how many distinct values you have on a column. And if you really have a column with more than 2 million distinct values, you can choose to disable dictionary on that column. And Kylin will still work out the cube correctly.
On Tue, May 19, 2015 at 7:49 PM, Luke Han <[email protected]> wrote: > Hi Tim, > Please make sure your Hive/Hadoop/HBase works very well from shell > first. one you loaded data into Hive, please run some hive query from > shell. My first guest is your user account "root" do not has permission to > execute some HDFS/Hive command. Which Hadoop distribution you are using, > CDH or HDP? > Please pick up one user account with fully permission to > HDFS/Hive/HBase, then you should be fine. > > Back to your #3 question, big data is not mean query "big" result in > one time;-) Kylin is design for OLAP, not for OLTP, not for ETL not for > ML...Most of analytics cases will only fetch small result set from big > source data with filtering and other conditions to bring the reasonable > size for "people" to get insight. 2m rows is already "huge" for one query. > > Hope these answers could help you a little bit. > Please feel free to leave your questions here again:) > > Thanks. > > Luke > > > > > Best Regards! > --------------------- > > Luke Han > > 2015-05-18 15:10 GMT+08:00 梁嘉明 <[email protected]>: > > > Hello, I'm a student from SCUT.I'm interested in big data.Recently I want > > to test the big data query ability of kylin,and met some problem that > > bother me for a month.I'm new here and eager to get your help, thank you! > > > > > > 1.I load tables to hive by ETLtool kettle sucessfully,but when I try to > > load them to kylin ,it told me success but the table size is null.So I > try > > to add my my table sql to kylin's create_sample_tables.sql, It create > > sucessfully this time but it obviously not a good way to create tables . > > what's the difference between these two method. And how can I load tables > > to kylin from hive in normal way. > > > > > > 2. When I build the sample cube ,It met error on the penultimate step > ,the > > error information is: org.apache.hadoop.security.AccessControlException: > > Permission denied. user=root is not the owner of inode=null. It seem > like > > a permission problem, and I can't find the way out. > > > > > > 3.I want to test the big data query ability,about 1 billion,but kylin set > > its threshold value as 2 million. I found the threshold setting in source > > code DictionaryGenerator.java . Are there any other threshold setting? > And > > what's next after I change the threshold value. > > > > yours, > sincerely. > > > > > > tim.ljm >
