Hi Dayue, here are some comments from my side:

1. The signagure covers not only fact table name, partition info, but also
dimensions and measures; The DimensionDesc object also contains the table
name, join condition, columns etc which are related with the lookup tables;
So, once there is change in the data model, this signature will also be
changed;

2. Persistent the old signature is for comparing with the new signature
after it be returned from front-end; see this:
https://github.com/apache/incubator-kylin/blob/0.7-staging/server/src/main/java/org/apache/kylin/rest/service/CubeService.java#L239

3. About the metadata consistency, in 0.7 it was a temporary solution,
which is not well implemented; From 0.8, Kylin UI has been changed a lot;
create/update data model are separate steps with create/update cube, that
will be easier for control;

2015-08-25 5:12 GMT+08:00 Dayue Gao <[email protected]>:

> Hi developers,
>
> When I was working on https://issues.apache.org/jira/browse/KYLIN-958 <
> https://issues.apache.org/jira/browse/KYLIN-958>, I found it difficult to
> implement CubeController.updateCubeDesc. The problems are
>
> 1. CubeDesc.calculateSignature only include fact table name and partition
> desc as data model information
>
> This means if user changes lookup tables or filter condition, cube desc
> signature won't change and kylin will not clear already built cube
> segments. BTW, why do we store signature in metadata rather than calculate
> it on demands? I know it may be an optimization to avoid recalculating
> signature every time, however desc changing shouldn't be a regular
> operation, so persisting signature won't give us too much benefit. What's
> more, once it's been recorded in metadata, it makes us difficult to change
> the computing logic.
>
> 2. Maintain metadata consistency
>
> This is a more general problem. As we have separated metadata into
> different files (cube, cube_desc, model_desc, project, etc) and maintaining
> consistency across these files is not an easy task in both
> FileResourceStore and HBaseResourceStore, IMO we'd better avoid operations
> that change multiple metadata files as much as possible.
> "CubeController.updateCubeDesc" is a notable counter-example. In order to
> complete this operation, a sequence of metadata updates (model_desc -> cube
> -> cube_desc -> cube -> project) is performed. Make sure
> "CubeController.updateCubeDesc" won't leave metadata in half success state
> is not easy.
>
> Given all these difficulties, do we really need to allow user to change
> data model? Can we just make data model immutable and only allow user to
> change cube desc? Immutable or versioned metadata is always good in my
> experience, so a further question is can we make key parts (properties that
> defines how cube was built, excluding description, notify_list for example)
> of cube desc also immutable and just make a shortcut in front-end to let
> user create new cube desc based on existing one?
>
> Best,
> Dayue

Reply via email to