Hi Yu, How is Kylin retrieving the data? Is it using Hive only for the metadata? Or is it using Hive to retrieve the data for it? If Kylin use Hive to retrieve the data for the build, then won't performance of hive have an impact on Kylin's performance as well?
I've also done some research for the above questions. Based on the reference [1] (slide 28 and 29), the process of cube build is like: Cube build - Steps 1. Build dictionary from dimension tables (hive tables) on local disk. And copy dictionary to HDFS. 2. Run Hive query to build a joined flatten table, which is also called intermediate hive table. 3. Run map reduce job to build cuboids in HDFS sequence files from tier 1 to tier N 4. Calculate the key distribution of HDFS sequence files. And every split the key space into K regions. 5. Translate HDFS sequence files into HBase HFile 6. Bulk load the HFile into HBase Question 1: In the step 2, Kyline run Hive query to generate the intermediate hive table. So Kylin does use Hive to retrieve the data for the cube build. Am I right? Question 2: Based on my understanding, Kylin only needs to cooperate with Hive at step 1 and 2? After that, Kylin does not need to retrieve data from Hive table for the map reduce jobs? [1] http://www.slideshare.net/XuJiang2/kylin-hadoop-olap-engine/28?utm_source=slideview&utm_medium=ssemail&utm_campaign=share_clip Best regards, Zhong On Sun, Jan 17, 2016 at 10:35 PM, yu feng <[email protected]> wrote: > Firstly, kylin do not distinguish which kind table in hive, if only you > can query it in hive, so the table can be normal table, external table, > view or table with some serdes. > then I think it is hard to build cube backward along the time in kylin. > maybe someone has some good ideas at this point. > > 2016-01-18 11:04 GMT+08:00 zhong zhang <[email protected]>: > > > Hi All, > > > > I'm wondering can I build the Kylin cube backward along the time. More > > specifically, can I build the cube from the current time to six months > ago > > and then from six months ago to 12 months ago and go on? In this way, I > can > > have the latest six months' cube result first. > > > > It's well known that the input of Kylin cube is hive table. Does it make > > any difference > > between using internal hive table and external hive table when building > the > > cube? > > > > Best regards, > > Zhong > > >
