[ 
https://issues.apache.org/jira/browse/KYLIN-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869858#comment-17869858
 ] 

Zhiting Guo commented on KYLIN-5945:
------------------------------------

h2. Design
h4. Metadata Splitting
Split the metadata main table in kylin4 into 28 small tables, and the tables 
that already exist independently in kylin4 are not affected (such as job_info, 
auditLog, etc.).
The metadata structure in the diagnostic package has also been adjusted to a 
flatter structure, with one table corresponding to one directory
!https://kyligence.feishu.cn/space/api/box/stream/download/asynccode/?code=ODQ2NjQ0ZGVmN2ZkNWQ2MWQ1ZWU1NDBiOGQ1OGNlNzVfclhIaE9EVGp5RkRYZU9aRUhERHREVzE4cmw4eEJPNDZfVG9rZW46Tzd4RGJyZEVvb0gwS3N4dlN5Q2NxS3NYbjhlXzE3MjI0MjAxOTg6MTcyMjQyMzc5OF9WNA|width=636,height=487!
h4. Metadata read-write separation
The metadata reading operation is the same as before the refactor, and is still 
based on memory cache to ensure that the performance of operations such as 
query and obtaining model lists is not affected;
The metadata update operation is transformed to directly connect to the 
database, using database transactions to provide multi-node concurrency 
capabilities;
The transaction process after the transformation is as follows:
!https://kyligence.feishu.cn/space/api/box/stream/download/asynccode/?code=N2EzMTdkNmYyZjUwOTQ5OWI2N2I4YmE2ZDJlYzU1ZGJfZFlVM2xXVzdCa3NwV0FpZEI3aXo5b0xUU0QyS3ZoSmVfVG9rZW46SFYzbGJpY05Qb0Z3dEl4ZmZyN2NvTml0blVkXzE3MjI0MjAxOTg6MTcyMjQyMzc5OF9WNA|width=594,height=613!
h4. Audit log transformation
In kylin4, auditLog will record the complete metadata after the update. The 
JSON volume may be too large. This transformation provides an option to only 
record incremental information to reduce network IO when synchronizing auditlog
An example is as follows:
Record the complete json when adding new metadata
!https://kyligence.feishu.cn/space/api/box/stream/download/asynccode/?code=MWJlZTZjOTY3NDFjNDhjYmNiZjVmNDc5ZGVkZDRkMWFfZ0pENUl1U2VwS2ZhZVhJa0tOdWxwU1M3MjRLeUt1a2VfVG9rZW46RkFKSWJsdkIyb0xabFN4Um96M2NKZ2ZJbm9lXzE3MjI0MjAxOTg6MTcyMjQyMzc5OF9WNA|width=711,height=491!
Subsequent updates only record the incremental part.
!https://kyligence.feishu.cn/space/api/box/stream/download/asynccode/?code=ZDVlZjQ1ZWFlMjVjMDg4ZTg1YzFhZTc3YTZkZWM5MDNfVFc2amVnQmJocmJMQlBadVVZSVJGVkFNS0ROSTdqWkpfVG9rZW46RzNYYmI0aklZb3JMWXV4a2QxTGNlazhMbmJnXzE3MjI0MjAxOTg6MTcyMjQyMzc5OF9WNA|width=707,height=459!
h4. Remove epoch
 # 
The project no longer assigns an owner, and any job node can concurrently 
update the metadata of any project
 # 
Remove the logic of forwarding requests based on epoch. If the resource group 
function is enabled, the request will be forwarded to the corresponding 
resource group.
 # 
Some scheduled tasks that rely on epoch, such as garbage cleaning and snapshot 
automatic refresh, are transformed into jobs and scheduled uniformly by the 
task scheduler, which are not visible on the task interface

> metadata refactor
> -----------------
>
>                 Key: KYLIN-5945
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5945
>             Project: Kylin
>          Issue Type: New Feature
>          Components: Metadata
>    Affects Versions: 5.0-alpha
>            Reporter: pengfei.zhan
>            Assignee: pengfei.zhan
>            Priority: Major
>             Fix For: 5.0.0
>
>         Attachments: Metadata deep dive (en).pdf, metadata (3).pdf
>
>
> Kylin's metadata has some architectural issues, such as: 
> 1. The metadata adopts a key-value structure. The JSON block contains a lot 
> of content, which will grow with the growth of customer business, resulting 
> in excessive network IO when updating metadata.
> 2. The design of epoch and project lock ensures that transactions within the 
> project are executed serially, resulting in single-point bottlenecks and 
> concurrency problems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to