Cool! Had a nice talk with Shaofeng and Li Yang this afternoon, I will fix this jira soon.
Best, Dayue > 在 2015年8月25日,下午2:46,Luke Han <[email protected]> 写道: > > Hi Dayue, > You are right, metadata is the key part of a system. > For KYLIN-958, you could apply any workaround for short term. for long > term purpose, we will go through current implementation and try to fix with > right approach to avoid conflict. > > Underling storage is not an issue, actually we just migrated from MySQL > to HBase in early 0.6 version, to remove one more dependency. I think the > metadata storage already be extracted as interface, should be easy to add > other storage again if necessary. > > Thanks. > > > > > > > > > > Best Regards! > --------------------- > > Luke Han > > On Tue, Aug 25, 2015 at 11:35 AM, Dayue Gao <[email protected]> wrote: > >> Metadata consistency is one of the most crucial things for many systems. >> >> So in the short run, to fix KYLIN-958, I suggest disallowing user to >> update data model. Even so, user can still create new data model to fulfill >> their needs. >> >> In the long run, I'd suggest migrating metadata persistence from NoSQL >> like HBase to a transactional database like MySQL. Although lots of work >> need to be done, it will make keeping metadata consistency a lot easier. >> >> What do you think? >> >> Best, >> Dayue >> >>> 在 2015年8月25日,上午11:11,Li Yang <[email protected]> 写道: >>> >>> Dayue has a good point. Although updating multiple resources in one >> request >>> is doable but the complexity does not worth the effort. >>> >>> Making model desc and cube desc immutable is a good idea. And we can >> still >>> implement "update" by first delete the old model and cube, then create >> new >>> ones with the same name. So from user point of view, it looks like an >>> update. This work around should do well on 0.7 branch where model and >> cube >>> are 1-1 strictly. >>> >>> The reason model and cube are separate resource is because in 0.8 branch, >>> they are 1-m relationship. User can create a model and create multiple >>> cubes on it. >>> >>> On Tue, Aug 25, 2015 at 10:31 AM, hongbin ma <[email protected]> >> wrote: >>> >>>> hi dayue, >>>> >>>> I'll agree with you. Current cube desc/model desc design is a result of >>>> multiple rounds of re-designing, and it may failed to take maintenance >>>> convenience into well consideration. And to be honest it's quite complex >>>> now, especially when involved with cube/model updates. >>>> >>>> Making cubes/models immutable looks appealing to me. However we might >> need >>>> some more front end work to reduce cube/model recreate overhead for >> users. >>>> >>>> @liyang and @luke will you please comment on this? >>>> >>>> On Tue, Aug 25, 2015 at 5:12 AM, Dayue Gao <[email protected]> wrote: >>>> >>>>> Hi developers, >>>>> >>>>> When I was working on https://issues.apache.org/jira/browse/KYLIN-958 >> < >>>>> https://issues.apache.org/jira/browse/KYLIN-958>, I found it difficult >>>> to >>>>> implement CubeController.updateCubeDesc. The problems are >>>>> >>>>> 1. CubeDesc.calculateSignature only include fact table name and >> partition >>>>> desc as data model information >>>>> >>>>> This means if user changes lookup tables or filter condition, cube desc >>>>> signature won't change and kylin will not clear already built cube >>>>> segments. BTW, why do we store signature in metadata rather than >>>> calculate >>>>> it on demands? I know it may be an optimization to avoid recalculating >>>>> signature every time, however desc changing shouldn't be a regular >>>>> operation, so persisting signature won't give us too much benefit. >> What's >>>>> more, once it's been recorded in metadata, it makes us difficult to >>>> change >>>>> the computing logic. >>>>> >>>>> 2. Maintain metadata consistency >>>>> >>>>> This is a more general problem. As we have separated metadata into >>>>> different files (cube, cube_desc, model_desc, project, etc) and >>>> maintaining >>>>> consistency across these files is not an easy task in both >>>>> FileResourceStore and HBaseResourceStore, IMO we'd better avoid >>>> operations >>>>> that change multiple metadata files as much as possible. >>>>> "CubeController.updateCubeDesc" is a notable counter-example. In order >> to >>>>> complete this operation, a sequence of metadata updates (model_desc -> >>>> cube >>>>> -> cube_desc -> cube -> project) is performed. Make sure >>>>> "CubeController.updateCubeDesc" won't leave metadata in half success >>>> state >>>>> is not easy. >>>>> >>>>> Given all these difficulties, do we really need to allow user to change >>>>> data model? Can we just make data model immutable and only allow user >> to >>>>> change cube desc? Immutable or versioned metadata is always good in my >>>>> experience, so a further question is can we make key parts (properties >>>> that >>>>> defines how cube was built, excluding description, notify_list for >>>> example) >>>>> of cube desc also immutable and just make a shortcut in front-end to >> let >>>>> user create new cube desc based on existing one? >>>>> >>>>> Best, >>>>> Dayue >>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> >>>> *Bin Mahone | 马洪宾* >>>> Apache Kylin: http://kylin.io >>>> Github: https://github.com/binmahone >>>> >> >> >>
