Cool!

Had a nice talk with Shaofeng and Li Yang this afternoon, I will fix this jira 
soon.

Best,
Dayue


> 在 2015年8月25日,下午2:46,Luke Han <[email protected]> 写道:
> 
> Hi Dayue,
>    You are right, metadata is the key part of a system.
>    For KYLIN-958, you could apply any workaround for short term. for long
> term purpose, we will go through current implementation and try to fix with
> right approach to avoid conflict.
> 
>    Underling storage is not an issue, actually we just migrated from MySQL
> to HBase in early 0.6 version, to remove one more dependency. I think the
> metadata storage already be extracted as interface, should be easy to add
> other storage again if necessary.
> 
>    Thanks.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Best Regards!
> ---------------------
> 
> Luke Han
> 
> On Tue, Aug 25, 2015 at 11:35 AM, Dayue Gao <[email protected]> wrote:
> 
>> Metadata consistency is one of the most crucial things for many systems.
>> 
>> So in the short run, to fix KYLIN-958, I suggest disallowing user to
>> update data model. Even so, user can still create new data model to fulfill
>> their needs.
>> 
>> In the long run, I'd suggest migrating metadata persistence from NoSQL
>> like HBase to a transactional database like MySQL. Although lots of work
>> need to be done, it will make keeping metadata consistency a lot easier.
>> 
>> What do you think?
>> 
>> Best,
>> Dayue
>> 
>>> 在 2015年8月25日,上午11:11,Li Yang <[email protected]> 写道:
>>> 
>>> Dayue has a good point. Although updating multiple resources in one
>> request
>>> is doable but the complexity does not worth the effort.
>>> 
>>> Making model desc and cube desc immutable is a good idea. And we can
>> still
>>> implement "update" by first delete the old model and cube, then create
>> new
>>> ones with the same name. So from user point of view, it looks like an
>>> update. This work around should do well on 0.7 branch where model and
>> cube
>>> are 1-1 strictly.
>>> 
>>> The reason model and cube are separate resource is because in 0.8 branch,
>>> they are 1-m relationship. User can create a model and create multiple
>>> cubes on it.
>>> 
>>> On Tue, Aug 25, 2015 at 10:31 AM, hongbin ma <[email protected]>
>> wrote:
>>> 
>>>> hi dayue,
>>>> 
>>>> I'll agree with you. Current cube desc/model desc design is a result of
>>>> multiple rounds of re-designing, and it may failed to take maintenance
>>>> convenience into well consideration. And to be honest it's quite complex
>>>> now, especially when involved with cube/model updates.
>>>> 
>>>> Making cubes/models immutable looks appealing to me. However we might
>> need
>>>> some more front end work to reduce cube/model recreate overhead for
>> users.
>>>> 
>>>> @liyang and @luke  will you please comment on this?
>>>> 
>>>> On Tue, Aug 25, 2015 at 5:12 AM, Dayue Gao <[email protected]> wrote:
>>>> 
>>>>> Hi developers,
>>>>> 
>>>>> When I was working on https://issues.apache.org/jira/browse/KYLIN-958
>> <
>>>>> https://issues.apache.org/jira/browse/KYLIN-958>, I found it difficult
>>>> to
>>>>> implement CubeController.updateCubeDesc. The problems are
>>>>> 
>>>>> 1. CubeDesc.calculateSignature only include fact table name and
>> partition
>>>>> desc as data model information
>>>>> 
>>>>> This means if user changes lookup tables or filter condition, cube desc
>>>>> signature won't change and kylin will not clear already built cube
>>>>> segments. BTW, why do we store signature in metadata rather than
>>>> calculate
>>>>> it on demands? I know it may be an optimization to avoid recalculating
>>>>> signature every time, however desc changing shouldn't be a regular
>>>>> operation, so persisting signature won't give us too much benefit.
>> What's
>>>>> more, once it's been recorded in metadata, it makes us difficult to
>>>> change
>>>>> the computing logic.
>>>>> 
>>>>> 2. Maintain metadata consistency
>>>>> 
>>>>> This is a more general problem. As we have separated metadata into
>>>>> different files (cube, cube_desc, model_desc, project, etc) and
>>>> maintaining
>>>>> consistency across these files is not an easy task in both
>>>>> FileResourceStore and HBaseResourceStore, IMO we'd better avoid
>>>> operations
>>>>> that change multiple metadata files as much as possible.
>>>>> "CubeController.updateCubeDesc" is a notable counter-example. In order
>> to
>>>>> complete this operation, a sequence of metadata updates (model_desc ->
>>>> cube
>>>>> -> cube_desc -> cube -> project) is performed. Make sure
>>>>> "CubeController.updateCubeDesc" won't leave metadata in half success
>>>> state
>>>>> is not easy.
>>>>> 
>>>>> Given all these difficulties, do we really need to allow user to change
>>>>> data model? Can we just make data model immutable and only allow user
>> to
>>>>> change cube desc? Immutable or versioned metadata is always good in my
>>>>> experience, so a further question is can we make key parts (properties
>>>> that
>>>>> defines how cube was built, excluding description, notify_list for
>>>> example)
>>>>> of cube desc also immutable and just make a shortcut in front-end to
>> let
>>>>> user create new cube desc based on existing one?
>>>>> 
>>>>> Best,
>>>>> Dayue
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> 
>>>> *Bin Mahone | 马洪宾*
>>>> Apache Kylin: http://kylin.io
>>>> Github: https://github.com/binmahone
>>>> 
>> 
>> 
>> 


Reply via email to