Dayue Gao created KYLIN-2013:
--------------------------------
Summary: more robust approach to hive schema changes
Key: KYLIN-2013
URL: https://issues.apache.org/jira/browse/KYLIN-2013
Project: Kylin
Issue Type: Bug
Components: Metadata, REST Service, Web
Affects Versions: v1.5.3
Reporter: Dayue Gao
Assignee: Dayue Gao
Our users occasionally want to change their existing cube, such as
adding/renaming/removing a dimension. Some of these changes require
modifications to its source hive table. So our user changed the table schema
and reloaded its metadata in Kylin, then several issues can happen depends on
what he changed.
I did some schema changing tests based on 1.5.3, the results after reloading
table are listed below
|| type of changes || fact table || lookup table ||
| *minor* | both query and build still works | query can fail or return wrong
answer |
| *major* | fail to load related cube | fail to load related cube |
{{minor}} changes refer to those doesn't change columns used in cubes, such as
insert/append new column, remove/change unused column.
{{major}} changes are the opposite, like remove/rename/change type of used
column.
Clearly from the table, reload a changed table is problematic in certain cases.
KYLIN-1536 reports a similar problem.
So what can we do to support this kind of iterative development process (load
-> define cube -> build -> reload -> change cube -> rebuild)?
My first thought is simply detect-and-prohibit reloading used table. User
should be able to know which cube is preventing him from reloading, and then he
could drop and recreate cube after reloading. However, defining a cube is not
an easy task (consider editing 100 measures). Force users to recreate their
cube over and over again will certainly not make them happy.
A better idea is to allow cube to be editable even if it's broken due to some
columns changed after reloading. Broken cube can't be built or queried, it can
only be edit or dropped. In fact, there is a cube status called
{{RealizationStatusEnum.DESCBROKEN}} in code, but was never used. We should
take advantage of it.
An enabled cube shouldn't allow schema changes, otherwise an unintentional
reload could make it unavailable. Similarly, a disabled but unpurged cube
shouldn't allow schema changes since it still has data in it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)