Kylin 2165 will be nice Yes 30% of total cube, because the cardinality of DIM was low ( 2K and 11K)
You are in true: When the cardinality of DIM are 1M, the intermediate table is only 5% of total: Picture (I don't know you can see pictures in this mailList) [image: Imágenes integradas 1] 2016-12-27 2:32 GMT+01:00 ShaoFeng Shi <[email protected]>: > Alberto, I didn't test ORC format; but as you know, Kylin consumes the > source data row by row (all columns at once), so I guess columnar format > like ORC may not benefit much. But this is a good try, if there is better > format we can switch to it. > > The "redistribute flat hive table" will add time but it can reduce time in > subsequent cube building (avoid data skew), especially when there are lots > of records. Usually it is fast (a couple minutes to ten or twenty minutes) > comparing to the cube build time. You mentioned it took 30% of total time, > what's the total time and what's the input number? When the input is small, > the overhead may overcome the benefit. > > For the method you mentioned (count on fact table, then put the > redistribute to step 1), actually it is supported in Kylin 1.5.4 (maybe > also 1.5.3) with a config parameter; but that method is not recommended as > it is unstable: In some cases (e.g, the fact table is a big hive view, or > it is a big table but not partitioned by date), a simple "select count(*) > from fact_table" will cost lots of resources on Hadoop, a second "create > intermediate_table as select ..." will start the same mappers again. > > In contrast, the as-is method is relatively stable for extreme case; > usually the intermediate table is much smaller than fact table, count and > redistribute on it will be low-cost; In next version there will be a > further optimization (https://issues.apache.org/jira/browse/KYLIN-2165) to > reduce the time in this step. > > > 2016-12-27 1:20 GMT+08:00 Alberto Ramón <[email protected]>: > > > Hello > > > > from v0, I correct english sintaxis > > > > > > After tunning of cube: > > - Use Hive input compress table > > - Define Hierarchy, Joint, Dim > > - . . . > > > > Now: 57% if for first steps (flat table, steps: 1,2,3) and 43% for > build > > cube > > > > I saw flat table uses SEQUENCEFILE, then I tested to use > > ORC, > > ORC + Snappy > > ORC + Snappy + Vectorization > > > > without good results, more ideas ?? > > > > > > I'm thinking that 'Redistribute Flat Hive Table' is a simple count and > uses > > > > *30% of total time* > > Is this the normal case ? > > We can aprox this count to: count of Fact Table (Will true 99% of > time), > > and put in // with step 1, is necessary be precise? > > > > 2016-12-22 14:00 GMT+01:00 Li Yang <[email protected]>: > > > > > Very good work! > > > > > > Btw, we are also doing benchmarks on SSB and TPC-H data sets, based on > > > below work. Will share more info soon. > > > > > > - http://www.cs.umb.edu/~poneil/StarSchemaB.PDF > > > - https://github.com/hortonworks/hive-testbench > > > > > > > > > Cheers > > > Yang > > > > > > On Wed, Dec 21, 2016 at 8:45 PM, Alberto Ramón < > > [email protected]> > > > wrote: > > > > > > > When Kylin 2149 <https://issues.apache.org/jira/browse/KYLIN-2149> > > will > > > be > > > > solved the performance will be* improve even more*, because: > > > > > > > > you know that 2016-05-05 Belongs to May, Week 18, and friday , but > > kylin > > > > doesnt know it > > > > It will try to calulate the combination of 2016-05-05 with January > > > February > > > > March, ... Monday Tuesday ..., W1 W2 ..., Q2 Q3 Q4 ==> There are a > lot > > of > > > > combination wasted > > > > > > > > 2016-12-21 12:57 GMT+01:00 Luke_Selina <[email protected]>: > > > > > > > > > Great and Agree! But I still have an question like Alberto, why in > an > > > AGG > > > > > one > > > > > dim can use only one regulation(mandatory, join, hierachy)? > > > > > > > > > > -- > > > > > View this message in context: http://apache-kylin.74782.x6. > > > > > nabble.com/Kylin-Performance-tp6713p6728.html > > > > > Sent from the Apache Kylin mailing list archive at Nabble.com. > > > > > > > > > > > > > > > > > > -- > Best regards, > > Shaofeng Shi 史少锋 >
