Julian,

Inferring implicit hierarchies from a highly correlated columns sounds
like an intriguing idea. Are you thinking Kylin auto infer that a set of
columns are correlated and allow for storage optimization or more of a
lazy specification of the hierarchies at the time of cuboid definition?

Wanted to hear Yang¹s thoughts on this.

Regards
Seshu

On 6/19/15, 12:03 PM, "Julian Hyde" <[email protected]> wrote:

>I¹d like to ask a provocative question: Why does Kylin have hierarchies?
>
>There may be some good reasons, but having thought for a long time about
>OLAP architectures I have come to the conclusion that hierarchies can be
>more trouble than they are worth. I regret that I made them so central to
>Mondrian¹s architecture; they are a part of the MDX language, so Mondrian
>had to have them in some form, but more of the system should have been
>built using attributes. Since Kylin is SQL-based, it doesn¹t need
>hierarchies at all.
>
>In OLAP, hierarchies are really useful in the presentation layer: a
>hierarchy is a drill path. If user has just expanded attribute A (e.g.
>Year) then they are very likely to want to expand attribute B (e.g.
>Month) or C (e.g. Week). So, hierarchies improve the user¹s experience.
>
>In the engine and storage layer there are some concepts similar to
>hierarchies:
>functional dependencies (i.e. for a given value of X, column Y always has
>the same value),
>highly correlated columns (e.g. for a given value of zipcode, state
>almost always has the same value), and
>columns that are frequently aggregated together (e.g. a query rarely has
>³group by productName² but more often has ³group by manufacturer, brand,
>productName²).
>
>These allow the kinds of storage optimization that hierarchies allow in
>Kylin, but they can be inferred without human intervention*, are more
>general, and less restrictive. For example, when choosing the set of
>cuboids you would tend to include highly correlated columns (if you have
>just built a cuboid using zipcode, there is a high benefit and low
>incremental cost to add state and nation to it because state is highly
>correlated and nation is functionally dependent). Same outcome has having
>an explicit (nation, state, zipcode) hierarchy.
>
>So, I am not claiming that hierarchies are not useful; I am claiming that
>they are not essential. If you were to remove explicit support for
>hierarchies and replace them with fuzzier concepts like highly correlated
>columns you might find that the system becomes radically simpler at its
>core.
>
>Forgive me for being provocative. I want to challenge assumptions. If the
>architecture is working fine, feel free to disregard. But if you are
>seeing signs to architectural strain, this might be an opportunity to
>simplify.
>
>Julian
>
>* Functional dependencies be inferred from the underlying star schema.
>Calcite¹s aggregate designer discovers highly correlated columns with no
>human intervention, just by profiling the data; and columns that are
>frequently aggregated together could be discovered by looking at query
>logs. Kylin could do something similar.

Reply via email to