Hi Dayue, here are some comments from my side: 1. The signagure covers not only fact table name, partition info, but also dimensions and measures; The DimensionDesc object also contains the table name, join condition, columns etc which are related with the lookup tables; So, once there is change in the data model, this signature will also be changed;
2. Persistent the old signature is for comparing with the new signature after it be returned from front-end; see this: https://github.com/apache/incubator-kylin/blob/0.7-staging/server/src/main/java/org/apache/kylin/rest/service/CubeService.java#L239 3. About the metadata consistency, in 0.7 it was a temporary solution, which is not well implemented; From 0.8, Kylin UI has been changed a lot; create/update data model are separate steps with create/update cube, that will be easier for control; 2015-08-25 5:12 GMT+08:00 Dayue Gao <[email protected]>: > Hi developers, > > When I was working on https://issues.apache.org/jira/browse/KYLIN-958 < > https://issues.apache.org/jira/browse/KYLIN-958>, I found it difficult to > implement CubeController.updateCubeDesc. The problems are > > 1. CubeDesc.calculateSignature only include fact table name and partition > desc as data model information > > This means if user changes lookup tables or filter condition, cube desc > signature won't change and kylin will not clear already built cube > segments. BTW, why do we store signature in metadata rather than calculate > it on demands? I know it may be an optimization to avoid recalculating > signature every time, however desc changing shouldn't be a regular > operation, so persisting signature won't give us too much benefit. What's > more, once it's been recorded in metadata, it makes us difficult to change > the computing logic. > > 2. Maintain metadata consistency > > This is a more general problem. As we have separated metadata into > different files (cube, cube_desc, model_desc, project, etc) and maintaining > consistency across these files is not an easy task in both > FileResourceStore and HBaseResourceStore, IMO we'd better avoid operations > that change multiple metadata files as much as possible. > "CubeController.updateCubeDesc" is a notable counter-example. In order to > complete this operation, a sequence of metadata updates (model_desc -> cube > -> cube_desc -> cube -> project) is performed. Make sure > "CubeController.updateCubeDesc" won't leave metadata in half success state > is not easy. > > Given all these difficulties, do we really need to allow user to change > data model? Can we just make data model immutable and only allow user to > change cube desc? Immutable or versioned metadata is always good in my > experience, so a further question is can we make key parts (properties that > defines how cube was built, excluding description, notify_list for example) > of cube desc also immutable and just make a shortcut in front-end to let > user create new cube desc based on existing one? > > Best, > Dayue
