[
https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wang, Gang updated KYLIN-2903:
------------------------------
Attachment: 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch
Attached it a patch.
One way is to leverage HQL 'COUNT DISTINCT' statement to calculate column
cardinality, and use 'INSERT OVERWRITE DIRECTORY' to put the result in the
output path. To make it recognizable for the following step
HiveColumnCardinalityUpdateJob, the output need following the specified format
as following:
column1 cardinality
column2 cardinality
column3 cardinality
.....
And this can be reached as well by setting 'ROW FORMAT DELIMITED' and adding
line break in HQL.
> support cardinality calculation for Hive view
> ---------------------------------------------
>
> Key: KYLIN-2903
> URL: https://issues.apache.org/jira/browse/KYLIN-2903
> Project: Kylin
> Issue Type: Improvement
> Components: Job Engine
> Reporter: Wang, Gang
> Assignee: Wang, Gang
> Priority: Minor
> Attachments:
> 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch
>
>
> Currently, Kylin leverage HCatlog to calculate column cardinality for Hive
> tables. While, HCatlog does not support Hive view actually.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)