[
https://issues.apache.org/jira/browse/KYLIN-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871021#comment-17871021
]
ASF subversion and git services commented on KYLIN-5949:
--------------------------------------------------------
Commit 64408cefd76fdc1a4d82e9c8f32d2e11aa09a016 in kylin's branch
refs/heads/kylin5 from Zhimin Wu
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=64408cefd7 ]
KYLIN-5949 Kylin supports Delta Lake as Index storage
1. Support Delta Lake as Index storage.
2. When querying, you can choose to cache the Delta Log on the driver or in RDD
Cache mode.
3. V1 and V3 storage is isolated at the model level.
4. Data storage is no longer divided into segments.
5. Query storage optimization can be performed at the index level.
Co-authored-by: Mingming Ge <[email protected]>
> Support DeltaLake as Index storage
> ----------------------------------
>
> Key: KYLIN-5949
> URL: https://issues.apache.org/jira/browse/KYLIN-5949
> Project: Kylin
> Issue Type: New Feature
> Components: Job Engine, Query Engine
> Affects Versions: 5.0.0
> Reporter: pengfei.zhan
> Assignee: Zhimin Wu
> Priority: Major
> Fix For: 5.0.0
>
> Attachments: image-2024-08-05-17-48-53-286.png, image.png
>
>
> h3. 设计目标
> # Segment逻辑化:取消Segment管理索引数据的设定,Segment只保留逻辑概念
> # 索引存储为表:根据索引类型的不同,设定不同的表类型,索引表化可以更好的利用查询引擎对于表处理的能力
> # 索引存储类型可扩展:默认存储从Parquet替换为Delta Lake,同时可以支持Iceberg以及Hudi的快速替换
> # 构建和查询的运行时参数动态调整:按照索引的特性,在运行时(构建和查询)动态调整执行引擎参数
> # 查询效果的稳定:无论是早期还是近期数据,应该保持相对一致的查询性能
> # 索引定向优化的能力:能够根据特定的查询,定向优化相对应的索引,能够对特定查询极致加速
> h3. 存储格式的变化
> h4. 原 Segment + parquet 存储
> {code:java}
> # V1 Cube结果数据文件结构
> parquet/
> └── dc65dd61-dbe3-8f46-7d44-668b688b96c1 (模型 ID)
> └── 12d2c4c1-248f-b1f8-0bdb-88b0eb9c8580 (Segment ID)
> ├── 1 (聚合索引ID)
> │ └──
> part-00000-393b8b08-84fc-40c6-8c2e-d579485dcc57-c000.snappy.parquet(数据)
> ├── 10001
> ├── 20001
> ├── 30001
> ├── 40001
> └── 20000000001(明细索引ID){code}
> h4. V3文件格式 - 数据由 delta lake 组织,以 Parquet 形式存储
> !image-2024-08-05-17-48-53-286.png|width=666,height=391!
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)