Lookup table's dictionary will be in memory, but not for each time, it should be shared for all queries.
Kylin supports high cardinality dimension so far (not perfect), will have one enhancement to support such thing. *Please refer to other mail thread about this.* Thanks. Best Regards! --------------------- Luke Han On Tue, Jul 28, 2015 at 6:28 PM, Dayue Gao <[email protected]> wrote: > Excellent explanation! Thanks you Hongbin! > > > Afterwards, Kylin will replace DimA values with DimB values(since both of > > them are dimensions on lookup table, Kylin can load the whole lookup > table > > into memory and build a mapping for them) > > A further question, does it means Kylin will load all needed lookup tables > for such queries each time? If that's the case, I think I should avoid > using derived dimension on big dimension like users, right? > > Regards, > Dayue > > > > 在 2015年7月28日,下午5:52,hongbin ma <[email protected]> 写道: > > > > derived column is used when one dimension can be deduced from another. > *(They > > must be dimension on lookup table)* > > for example, suppose we have a lookup table: > > > > dimA dimB dimC > > 1 a ? > > 2 b ? > > 3 c ? > > 4 a ? > > > > in this case, given a value in DimA, the value of DimB is determined, so > we > > say dimB can be derived from DimA. When we build a cube that contains > both > > DimA and DimB, we simple include DimA, and marking DimB as derived. > Derived > > column(DimB) does not participant in cuboid spanning, thus reducing the > > possible combinations: > > > > original combinations: > > ABC,AB,AC,BC,A,B,C > > > > combinations when driving B from A: > > AC,A,C > > > > at Runtime, in case queries like "select count(*) from fact_table inner > > join looup1 group by looup1 .dimB", it is expecting cuboid containing > DimB > > to answer the query. However, DimB will appear in none of the cuboids due > > to derived optimization. In this case, we modify the execution plan to > make > > it group by DimA(its deriving dimension) first, we'll get intermediate > > answer like: > > > > DimA count(*) > > 1 1 > > 2 1 > > 3 1 > > 4 1 > > > > Afterwards, Kylin will replace DimA values with DimB values(since both of > > them are dimensions on lookup table, Kylin can load the whole lookup > table > > into memory and build a mapping for them), and the intermediate result > > becomes: > > > > DimB count(*) > > a 1 > > b 1 > > c 1 > > a 1 > > > > After this, the runtime SQL engine(calcite) will further aggregate the > > intermediate result to: > > > > DimB count(*) > > a 2 > > b 1 > > c 1 > > > > this step happens at query runtime, this is what it means "at the cost of > > extra runtime aggregation" > > > > On Tue, Jul 28, 2015 at 11:21 AM, Dayue Gao <[email protected] <mailto: > [email protected]>> wrote: > > > >> Hi Kylin Developers, > >> > >> I'm confused about when to use derived dimension and how it will affects > >> query performance. So far I've known that derived dimension will not be > >> included in row key and value, then how kylin achieves to implement > query > >> grouping by these dimensions? The "Design Cube in Kylin" document says > >> "Derived dimension reduces combination from 2N to 2 at the cost of extra > >> runtime aggregation." I find it hard to understand. > >> > >> Also, it's not clear to me what is the requirement for using derived > >> dimension. In "Add Dimension" step, there is a tip saying "Using Derived > >> for One-One relationship between columns, like ID and Name". Does it > mean > >> derived dimension should be unique key of the lookup table? If that's > the > >> case, the applicable scenario seams limited. > >> > >> I'm aware that there's a JIRA > >> https://issues.apache.org/jira/browse/KYLIN-887 < > >> https://issues.apache.org/jira/browse/KYLIN-887 < > https://issues.apache.org/jira/browse/KYLIN-887>> for it, but I can't wait > >> to know the answer :-) > >> > >> Regards, > >> Dayue > > > > > > > > > > -- > > Regards, > > > > *Bin Mahone | 马洪宾* > > Apache Kylin: http://kylin.io <http://kylin.io/> > > Github: https://github.com/binmahone <https://github.com/binmahone> >
