答复: Querying raw data / lowest granularity with Kylin

Huang Hua Mon, 10 Aug 2015 03:18:45 -0700

I haven't used the "InvertedIndex" feature, but I think the feature is still in 
early stage in terms of functionality and stability.


Back to the time when we were using with kylin-0.6, we had a very similar use 
case that to drill down to the lowest granularity of the data.
What we did is to define the filter columns as dimensions(almost defined as 
mandatory ones to avoid the cube expansion), all other result columns as 
measures. 

You can think of our case more like using kylin to build query index in HBase 
in order to support queries like "fetch all transactions given a user or server 
user ids or user names or other filters so".
However, ultimately, we realized that maybe Kylin wasn't the best option to 
support such queries, because Kylin is very good at rollup queries with 
pre-computed measures and a limited number of filters. Perhaps with the 
enhancement of "InvertedIndex" we can see more possibilities from Kylin when 
dealing with the lowest granularity queries.

Best,
Hua     
> -----邮件原件-----
> 发件人: dev-return-3593-
> [email protected] [mailto:dev-return-
> [email protected]] 代表 alex
> schufo
> 发送时间: 2015年8月10日 17:24
> 收件人: [email protected]
> 主题: Querying raw data / lowest granularity with Kylin
> 
> I have some scenarios where I would like to drill down to the lowest
> granularity of my table, does Kylin handle this?
> 
> If I am not mistaken a least one "group by" should always be used.
> 
> So I tried to query by grouping by all my dimensions at the same time :
> "select dim1, dim2, ..., dimN, sum(measure1), ..., sum(measureN) from ...
> where ... group by dim1, dim2, ..., dimN". This gives me the expected results.
> Is this the correct way to do it?
> 
> Although this seems to work, with several dimension it would mean building
> a lot of cubes and using a lot of space whereas in this case it would not
> necessarily be used. I know that aggregation groups can be used to solve
> reduce this. With the same example I created 1 aggregation group for each
> dimension and the expansion rate is 200%, but I tested only on 5 dimensions.
> Again, is this the correct way to do it?
> 
> Relative to this topic, I saw:
> 
> v0.7.x: InvertedIndex (HybridOLAP)
> Goal:
> Introduce InvertedIndex to optimise queries on raw data and low level
> aggregation
> 
> on https://issues.apache.org/jira/browse/KYLIN-577
> 
> Is this something that is currently available in 0.7.2? This ticket dates back
> from beginning 2015, so I am not sure if it reflects Kylin current plan or 
> not.

答复: Querying raw data / lowest granularity with Kylin

Reply via email to