Re: 答复: Querying raw data / lowest granularity with Kylin

alex schufo Tue, 11 Aug 2015 10:36:21 -0700

Thanks for those details.

I read about mandatory dimensions in the presentation, but how does one
make a dimension mandatory in the Cube Builder UI?


In terms of use case I can see the following:

   - Drill down from hierarchies (aggregations) until the lowest
   granularity (raw data). For example imagine you have book stores everywhere
   in the US, the user would pick a date range and see how many sells per US
   State, then click one State and see how many sells per city for this State,
   then click on one city and see the sells per book store for that city, and
   finally when clicking on one store you could see the actual transactions
   that lead to those sells total numbers
   - Use Kylin as a single fast access to Hadoop data: build cubes for
   regular OLAP process but also being able to query other Hive tables that do
   not require specifically aggregations but dimensional filtering on raw data
   and benefiting from Kylin SQL interface and fast HBase queries

These are not as strong requirements as what Kylin provides (OLAP) but
having it would be very nice in my view, if it fits the project.

On Tue, Aug 11, 2015 at 10:00 AM, Li Yang <[email protected]> wrote:

> > ... at least one "group by" should always be used.
>
> This is correct. So the lowest granularity Kylin provides is by grouping
> all dimensions, which is what Alex has tried if I understand correctly.  We
> believe this can solve 90% of analysis requirement.
>
> > ... using a lot of space whereas in this case it would not necessarily be
> used.
>
> You can set dimensions to be "mandatory" such that less dimension
> combinations will be calculated.  See more at
> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
>
> > "InvertedIndex" feature ... is still in early stage in terms of
> functionality and stability.
>
> Very true.  We have experimented "inverted-index" to solve two
> requirements: 1) Neal Real Time data readiness in Kylin;  2) Query raw
> data.  Later 1) is solved by another feature called Stream Cubing, thus the
> priority of "inverted-index" greatly reduces since the need of raw record
> analysis seems not strong.
>
>
> Do you (or any one) see raw record query a must-have feature?  We'd like to
> hear your use case.
>
> Cheers
> Yang
>
> On Tue, Aug 11, 2015 at 8:30 AM, Luke Han <[email protected]> wrote:
>
> > Currently, Kylin not support detail/raw data query, that's why you
> already
> > knew you have add at least one "group by" in your query.
> >
> > As growing requirement about this feature, we actually are evaluating
> > and will update our idea soon here.
> >
> > The roadmap is a little bit changed due to some priority changed.
> > I'm drafting a new one for coming release.
> >
> > Please help to let's know if there are any feature, function or anything
> > else which missing but your cases are really need them.
> >
> > Thanks.
> >
> >
> >
> >
> > Best Regards!
> > ---------------------
> >
> > Luke Han
> >
> > On Mon, Aug 10, 2015 at 6:17 PM, Huang Hua <[email protected]>
> > wrote:
> >
> > > I haven't used the "InvertedIndex" feature, but I think the feature is
> > > still in early stage in terms of functionality and stability.
> > >
> > > Back to the time when we were using with kylin-0.6, we had a very
> similar
> > > use case that to drill down to the lowest granularity of the data.
> > > What we did is to define the filter columns as dimensions(almost
> defined
> > > as mandatory ones to avoid the cube expansion), all other result
> columns
> > as
> > > measures.
> > >
> > > You can think of our case more like using kylin to build query index in
> > > HBase in order to support queries like "fetch all transactions given a
> > user
> > > or server user ids or user names or other filters so".
> > > However, ultimately, we realized that maybe Kylin wasn't the best
> option
> > > to support such queries, because Kylin is very good at rollup queries
> > with
> > > pre-computed measures and a limited number of filters. Perhaps with the
> > > enhancement of "InvertedIndex" we can see more possibilities from Kylin
> > > when dealing with the lowest granularity queries.
> > >
> > > Best,
> > > Hua
> > > > -----邮件原件-----
> > > > 发件人: dev-return-3593-
> > > > [email protected] [mailto:
> dev-return-
> > > > [email protected]] 代表 alex
> > > > schufo
> > > > 发送时间: 2015年8月10日 17:24
> > > > 收件人: [email protected]
> > > > 主题: Querying raw data / lowest granularity with Kylin
> > > >
> > > > I have some scenarios where I would like to drill down to the lowest
> > > > granularity of my table, does Kylin handle this?
> > > >
> > > > If I am not mistaken a least one "group by" should always be used.
> > > >
> > > > So I tried to query by grouping by all my dimensions at the same
> time :
> > > > "select dim1, dim2, ..., dimN, sum(measure1), ..., sum(measureN) from
> > ...
> > > > where ... group by dim1, dim2, ..., dimN". This gives me the expected
> > > results.
> > > > Is this the correct way to do it?
> > > >
> > > > Although this seems to work, with several dimension it would mean
> > > building
> > > > a lot of cubes and using a lot of space whereas in this case it would
> > not
> > > > necessarily be used. I know that aggregation groups can be used to
> > solve
> > > > reduce this. With the same example I created 1 aggregation group for
> > each
> > > > dimension and the expansion rate is 200%, but I tested only on 5
> > > dimensions.
> > > > Again, is this the correct way to do it?
> > > >
> > > > Relative to this topic, I saw:
> > > >
> > > > v0.7.x: InvertedIndex (HybridOLAP)
> > > > Goal:
> > > > Introduce InvertedIndex to optimise queries on raw data and low level
> > > > aggregation
> > > >
> > > > on https://issues.apache.org/jira/browse/KYLIN-577
> > > >
> > > > Is this something that is currently available in 0.7.2? This ticket
> > > dates back
> > > > from beginning 2015, so I am not sure if it reflects Kylin current
> plan
> > > or not.
> > >
> > >
> > >
> >
>

Re: 答复: Querying raw data / lowest granularity with Kylin

Reply via email to