Re: Understanding the cube building process

Li Yang Sun, 08 May 2016 04:50:23 -0700

Many things affects cube build speed. From workload point of view, it's
your data size and cube definition. From capacity point of view, it's the
size and available resource of your hadoop cluster. Finally, there are many
tuning about MR jobs. Checking if hive table as mapper splits are balanced
is the starting point.


Multiple segments build in parallel is no problem in theory. It's just for
simplicity at moment that they go in sequence.

On Wed, May 4, 2016 at 3:09 PM, Vaibhav Taro <[email protected]>
wrote:

> I am also waiting for the document on Streaming cubes, glad to hear that
> it's in progress.
>
> The talk that you gave is very insightful. I still have few doubts
> regarding Cube build process, it would be really helpful if you can clear
> them.
>
> - Cube build process sometimes takes more time, how can we optimize the
> cube build process? In my case, I don't have hierarchical dimension or
> derived dimensions, so not much scope to optimize as per this doc
> http://kylin.apache.org/docs15/howto/howto_optimize_cubes.html
>
> - I tried doing cube refresh when there is no new data in that cube
> segment, still cube build processes took around 6 minutes. So it looks like
> there is scope to optimize cube build process in such cases. In the
> nutshell what are the factor affecting cube build time?
>
> - Is it possible to run refresh cube for multiple cube segments in
> parallel?
>
> Thanks in advance.
>
>
>
> On Wed, May 4, 2016 at 11:43 AM, Li Yang <[email protected]> wrote:
>
> > Shaofeng is working on a document about Kafka and streaming cubing. Let's
> > wait.
> >
> > On Tue, May 3, 2016 at 11:26 PM, Nick Dimiduk <[email protected]>
> wrote:
> >
> > > Very nice talk, thank you. That helped put many things into context for
> > me.
> > > I will resume my study of the code for understanding engine
> > implementation
> > > details.
> > >
> > > One final question -- is there a doc for getting started with the
> > > experimental Kafka integration?
> > >
> > > Thanks,
> > > Nick
> > >
> > > On Tue, May 3, 2016 at 2:45 AM, Li Yang <[email protected]> wrote:
> > >
> > > > It's complicated. As of Kylin 1.5, there are two flavors of cubing
> > > > algorithm. Below talk covered a bit. There's no comprehensive
> document
> > at
> > > > the moment.
> > > >
> > > > https://www.youtube.com/watch?v=n74zvLmIgF0
> > > >
> > > >
> > > > On Tue, May 3, 2016 at 7:52 AM, Nick Dimiduk <[email protected]>
> > > wrote:
> > > >
> > > > > Hi there,
> > > > >
> > > > > I'm curious to understand how Kylin goes about building cubes. I've
> > > > > deployed it on a single-node cluster and played around with the
> > sample
> > > > cube
> > > > > [0]. Now i'm looking through the kylin server log and the code in
> the
> > > > > 'engine-mr'. I'm not finding much in the way of docs in the source
> > code
> > > > > though :(
> > > > >
> > > > > Is there any presentation, blog post, &c that gives and overview of
> > > these
> > > > > internals? I did find [1] but I'm looking go descend another level.
> > I'm
> > > > > curious about the various steps involved (looks like it ran 18
> > "steps"
> > > > and
> > > > > 10 MR jobs), what they're doing. I'm also curious about the schema
> > > design
> > > > > for the data model in HBase.
> > > > >
> > > > > Thanks in advance!
> > > > > -n
> > > > >
> > > > > [0]: http://kylin.apache.org/docs15/tutorial/kylin_sample.html
> > > > > [1]:
> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Regards,
> VaibhaV
>

Re: Understanding the cube building process

Reply via email to