Re: Lookup Table Enumerator high memory

hongbin ma Tue, 01 Sep 2015 08:24:24 -0700

Kylin handles star schema well, but my encounter issues like OOM on your
case.
How many large lookup tables do you have?
I'm not sure if a evict policy will help because anytime a SQL involves the
lookup table, the lookup table snapshot will have to be loaded again(so the
snapshots are swapping-in-swapping-out)


One way to solve the problem is to join your tables into a flatten table
using Hive view, providing Kylin with single big fact table. And please
notice avoid using dictionary on high cardinality columns.

On Tue, Sep 1, 2015 at 11:16 PM, Abhilash L L <[email protected]> wrote:

> Thanks for replying Hongbin,
>
>      for 1) we are trying to add some sort of evitction based cache instead
> of a map. However, we still are trying to figure out what to do for 3).
>
>     What is the general advice ? The case here is ..  I have order details
> as a fact and order as a dimension and also customer. Now each of these
> will run into many millions.  Also, the f-key is not a long/bigint, its a
> string which is a combination of our custom columns. Making it a dictionary
> will not work as we understand. Please suggest what should be the approach
> taken
>
> Regards,
> Abhilash
>
> On Tue, Sep 1, 2015 at 4:37 PM, hongbin ma <[email protected]> wrote:
>
> >     for 1) ..  seems like only the resource path / table desc etc is only
> > kept in memory while a new lookupstringtable is created per query/request
> > which holds onto data for the lifetime of the request.  So once the
> request
> > is done, it should be garbage collectable ?
> >
> > /table is just for the hive table's schema, the look up table content is
> > cached in SnapshotManager and it will not be evicted so far. So if you
> have
> > a lot of large lookup tables this will be a problem
> >
> >
> > 3) Also the derived filter translator, is there a way to modify the '
> > IN_THRESHOLD'  via config file ?
> >
> > Are you facing performance issue with a lot of IN clauses? if so , please
> > take a look at https://issues.apache.org/jira/browse/KYLIN-740, the
> patch
> > will be merged into next release
> >
> > On Mon, Aug 31, 2015 at 9:54 PM, Abhilash L L <[email protected]>
> > wrote:
> >
> > > Sorry for the confusion,
> > >
> > >     for 1) ..  seems like only the resource path / table desc etc is
> only
> > > kept in memory while a new lookupstringtable is created per
> query/request
> > > which holds onto data for the lifetime of the request.  So once the
> > request
> > > is done, it should be garbage collectable ?
> > >
> > >
> > > 3) Also the derived filter translator, is there a way to modify the '
> > > IN_THRESHOLD'  via config file ?
> > >
> > >
> > >
> > >
> > >
> > > Regards,
> > > Abhilash
> > >
> > > On Mon, Aug 31, 2015 at 7:05 PM, Abhilash L L <[email protected]>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > >     We started noticing that Kylin tomcat server is taking a lot of
> > ram.
> > > > It even hit a limit of 10GB.
> > > >
> > > >     After spending some time by going over the code, it seems like
> the
> > > > cube enumerator is not storing anything in memory. But the Lookup
> table
> > > > enumerator seems to be loading all records and storing it in memory.
> > > >
> > > >     1) What happens when there are lot of projects defined and we end
> > up
> > > > with tons of look up tables across them. Does it get swapped out
> > > > automatically ?  I am not able to track where eviction is happening.
> > The
> > > > snapshot manager has a 'removeSnapshot' but its intent seems
> different
> > to
> > > > me.
> > > >
> > > >     2) How do we handle really higher cardinality dimension. Eg: If I
> > > have
> > > > sales as a fact and customers as a dimension, there will be millions
> of
> > > > customers. However a store is good candidate to keep in memory but
> not
> > > > customers. Whats the recommended setting while creating the cube to
> > > handle
> > > > such a case
> > > >
> > > > Regards,
> > > > Abhilash
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Lookup Table Enumerator high memory

Reply via email to