Re: internal hive table and build the cube backward

zhong zhang Tue, 19 Jan 2016 11:29:02 -0800

Hi Yu,

How is Kylin retrieving the data? Is it using Hive only for the metadata?
Or is it using Hive to retrieve the data for it?
If Kylin use Hive to retrieve the data for the build, then won't
performance of hive have an impact on Kylin's performance
as well?

I've also done some research for the above questions.

Based on the reference [1] (slide 28 and 29), the process of cube build is
like:

Cube build - Steps

1. Build dictionary from dimension tables (hive tables) on local disk. And
copy dictionary to HDFS.

2. Run Hive query to build a joined flatten table, which is also called
intermediate hive table.

3. Run map reduce job to build cuboids in HDFS sequence files from tier 1
to tier N

4. Calculate the key distribution of HDFS sequence files. And every split
the key space

into K regions.

5. Translate HDFS sequence files into HBase HFile

6. Bulk load the HFile into HBase

Question 1:

In the step 2, Kyline run Hive query to generate the intermediate hive
table. So Kylin does use Hive

to retrieve the data for the cube build. Am I right?

Question 2:

Based on my understanding, Kylin only needs to cooperate with Hive at step
1 and 2? After that,

Kylin does not need to retrieve data from Hive table for the map reduce
jobs?

[1]
http://www.slideshare.net/XuJiang2/kylin-hadoop-olap-engine/28?utm_source=slideview&utm_medium=ssemail&utm_campaign=share_clip

Best regards,

Zhong

On Sun, Jan 17, 2016 at 10:35 PM, yu feng <[email protected]> wrote:

> Firstly, kylin do not distinguish which kind table in hive,  if only you
> can query it in hive, so the table can be normal table, external table,
> view or table with some serdes.
> then I think it is hard to build cube backward along the time in kylin.
> maybe someone has some good ideas at this point.
>
> 2016-01-18 11:04 GMT+08:00 zhong zhang <[email protected]>:
>
> > Hi All,
> >
> > I'm wondering can I build the Kylin cube backward along the time. More
> > specifically, can I build the cube from the current time to six months
> ago
> > and then from six months ago to 12 months ago and go on? In this way, I
> can
> > have the latest six months' cube result first.
> >
> > It's well known that the input of Kylin cube is hive table. Does it make
> > any difference
> > between using internal hive table and external hive table when building
> the
> > cube?
> >
> > Best regards,
> > Zhong
> >
>

Re: internal hive table and build the cube backward

Reply via email to