[
https://issues.apache.org/jira/browse/KYLIN-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wangxianbin updated KYLIN-1079:
-------------------------------
Comment: was deleted
(was: It's always better to follow the system architect, I will follow this
design)
> Manager large number of entries in metadata store
> -------------------------------------------------
>
> Key: KYLIN-1079
> URL: https://issues.apache.org/jira/browse/KYLIN-1079
> Project: Kylin
> Issue Type: Improvement
> Affects Versions: v2.0, v1.1, v1.0
> Reporter: hongbin ma
> Assignee: hongbin ma
> Labels: newbie
> Fix For: v2.1, v1.3
>
>
> Kylin saves cube metadata, table metadata as well as job history/output in a
> metadata store. The HBaseMetadataStore is a fault tolerant implementation
> which brings no extra dependencies to the system. We use it in real world
> deployments.
> When cube or hive table is updated, the correspond entries in metadata store
> simply updated.(so there's no way to trace history cube definitions, anyway
> this is not very expected function).However Job histories and outputs are a
> little special, each cubing job's definition and output are saved as new
> entries in the metadata store. As more and more jobs accumulate, a lot of job
> histories will reside in the metadata store. This might harm frontend
> performance when user wants to query job histories.
> We should tackle the problem from two perspectives:
> 1.Backend tool to delete/archive job history based on given conditions,e.g.
> "all jobs that is older than one month and not referenced by any cube
> segment(each cube segment keeps track of which job created it)"
> 2.Frontend display enforce timestamp filter to retrieve from metadata store
> for efficiency. When showing job lists, for example, a "Show last N days"
> filter is enforced, where N is configurable by the user. For
> HBaseMetadataStore, we saved timestamp for each entry in a separate column,
> this is where HBase SingleColumnValueFilter can help.
> We can start working this on 2.x-staging branch(as it is the latest dev
> branch, and is more friendly to developers), and backport it to 1.x-staging
> branch if necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)