Kylin handles star schema well, but my encounter issues like OOM on your case. How many large lookup tables do you have? I'm not sure if a evict policy will help because anytime a SQL involves the lookup table, the lookup table snapshot will have to be loaded again(so the snapshots are swapping-in-swapping-out)
One way to solve the problem is to join your tables into a flatten table using Hive view, providing Kylin with single big fact table. And please notice avoid using dictionary on high cardinality columns. On Tue, Sep 1, 2015 at 11:16 PM, Abhilash L L <[email protected]> wrote: > Thanks for replying Hongbin, > > for 1) we are trying to add some sort of evitction based cache instead > of a map. However, we still are trying to figure out what to do for 3). > > What is the general advice ? The case here is .. I have order details > as a fact and order as a dimension and also customer. Now each of these > will run into many millions. Also, the f-key is not a long/bigint, its a > string which is a combination of our custom columns. Making it a dictionary > will not work as we understand. Please suggest what should be the approach > taken > > Regards, > Abhilash > > On Tue, Sep 1, 2015 at 4:37 PM, hongbin ma <[email protected]> wrote: > > > for 1) .. seems like only the resource path / table desc etc is only > > kept in memory while a new lookupstringtable is created per query/request > > which holds onto data for the lifetime of the request. So once the > request > > is done, it should be garbage collectable ? > > > > /table is just for the hive table's schema, the look up table content is > > cached in SnapshotManager and it will not be evicted so far. So if you > have > > a lot of large lookup tables this will be a problem > > > > > > 3) Also the derived filter translator, is there a way to modify the ' > > IN_THRESHOLD' via config file ? > > > > Are you facing performance issue with a lot of IN clauses? if so , please > > take a look at https://issues.apache.org/jira/browse/KYLIN-740, the > patch > > will be merged into next release > > > > On Mon, Aug 31, 2015 at 9:54 PM, Abhilash L L <[email protected]> > > wrote: > > > > > Sorry for the confusion, > > > > > > for 1) .. seems like only the resource path / table desc etc is > only > > > kept in memory while a new lookupstringtable is created per > query/request > > > which holds onto data for the lifetime of the request. So once the > > request > > > is done, it should be garbage collectable ? > > > > > > > > > 3) Also the derived filter translator, is there a way to modify the ' > > > IN_THRESHOLD' via config file ? > > > > > > > > > > > > > > > > > > Regards, > > > Abhilash > > > > > > On Mon, Aug 31, 2015 at 7:05 PM, Abhilash L L <[email protected]> > > > wrote: > > > > > > > Hello, > > > > > > > > We started noticing that Kylin tomcat server is taking a lot of > > ram. > > > > It even hit a limit of 10GB. > > > > > > > > After spending some time by going over the code, it seems like > the > > > > cube enumerator is not storing anything in memory. But the Lookup > table > > > > enumerator seems to be loading all records and storing it in memory. > > > > > > > > 1) What happens when there are lot of projects defined and we end > > up > > > > with tons of look up tables across them. Does it get swapped out > > > > automatically ? I am not able to track where eviction is happening. > > The > > > > snapshot manager has a 'removeSnapshot' but its intent seems > different > > to > > > > me. > > > > > > > > 2) How do we handle really higher cardinality dimension. Eg: If I > > > have > > > > sales as a fact and customers as a dimension, there will be millions > of > > > > customers. However a store is good candidate to keep in memory but > not > > > > customers. Whats the recommended setting while creating the cube to > > > handle > > > > such a case > > > > > > > > Regards, > > > > Abhilash > > > > > > > > > > > > > > > -- > > Regards, > > > > *Bin Mahone | 马洪宾* > > Apache Kylin: http://kylin.io > > Github: https://github.com/binmahone > > > -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
