xinghuayu007 opened a new pull request #7149:
URL: https://github.com/apache/incubator-doris/pull/7149


   ## Proposed changes
   
   **For issue: #6359** 
   
   ### Background
   
   
![image](https://user-images.githubusercontent.com/12771191/142423932-6a2c3349-3f7d-4161-9626-62e4d1eafc16.png)
   
   
![image](https://user-images.githubusercontent.com/12771191/142423979-2c5059b7-5d6d-4aa8-ab86-aeffd1034392.png)
   
![image](https://user-images.githubusercontent.com/12771191/142424023-79d838e1-b838-486f-abef-990514d5ea29.png)
   
   ### Z-Order
   
![image](https://user-images.githubusercontent.com/12771191/142424158-8a93f3fc-8a76-4539-a39d-8ad45059c80a.png)
   
![image](https://user-images.githubusercontent.com/12771191/142424196-254321a5-ef49-4a90-b693-cf563105dfd7.png)
   
   ### Application Situation
   
   
![image](https://user-images.githubusercontent.com/12771191/142424452-bba445df-787b-4993-b486-1c387f2115b2.png)
   
   ### Grammar
   
   >  CREATE TABLE `table2` (
     `siteid` int(11) NULL DEFAULT "10" COMMENT "",
     `citycode` int(11) NULL COMMENT "",
     `username` varchar(32) NULL DEFAULT "" COMMENT "",
     `pv` bigint(20) NULL DEFAULT "0" COMMENT ""
   ) ENGINE=OLAP
   DUPLICATE KEY(`siteid`, `citycode`)
   COMMENT "OLAP"
   DISTRIBUTED BY HASH(`siteid`) BUCKETS 1
   PROPERTIES (
   "replication_allocation" = "tag.location.default: 1",
   "data_sort.sort_type" = "ZORDER",
   "data_sort.col_num" = "2",
   "in_memory" = "false",
   "storage_format" = "V2"
   );
   
   data_sort.sort_type:  support lexical/z-order sort type, default is lexical 
sort type
   data_sort.col_num: take the pre-columns as sort key
   
   ### Performance Test
   
   **Load Performance**
   Env: ssb scale 100, stream load
   ![Stream Load 
Performance(s)](https://user-images.githubusercontent.com/12771191/142427125-5190dbb2-f1cd-4aa2-9aaa-d3fe4ed1b37d.png)
   
   
   **Query Performance**
   Env: TPCH scale 25
   Table:
   
   > CREATE TABLE `LINEITEM` (
     `L_PARTKEY` int(11) NOT NULL COMMENT "",
     `L_SUPPKEY` int(11) NOT NULL COMMENT "",
     `L_ORDERKEY` int(11) NOT NULL COMMENT "",
     `L_LINENUMBER` int(11) NOT NULL COMMENT "",
     `L_QUANTITY` decimal(15, 2) NOT NULL COMMENT "",
     `L_EXTENDEDPRICE` decimal(15, 2) NOT NULL COMMENT "",
     `L_DISCOUNT` decimal(15, 2) NOT NULL COMMENT "",
     `L_TAX` decimal(15, 2) NOT NULL COMMENT "",
     `L_RETURNFLAG` char(1) NOT NULL COMMENT "",
     `L_LINESTATUS` char(1) NOT NULL COMMENT "",
     `L_SHIPDATE` date NOT NULL COMMENT "",
     `L_COMMITDATE` date NOT NULL COMMENT "",
     `L_RECEIPTDATE` date NOT NULL COMMENT "",
     `L_SHIPINSTRUCT` char(25) NOT NULL COMMENT "",
     `L_SHIPMODE` char(10) NOT NULL COMMENT "",
     `L_COMMENT` varchar(44) NOT NULL COMMENT "",
     `L_DEFAULT` varchar(44) NULL COMMENT ""
   ) ENGINE=OLAP
   DUPLICATE KEY(`L_PARTKEY`, `L_SUPPKEY`)
   COMMENT "OLAP"
   DISTRIBUTED BY HASH(`L_PARTKEY`) BUCKETS 1
   PROPERTIES (
   "replication_num" = "3",
   "in_memory" = "false",
   "storage_format" = "V2"
   ); 
   
   Q1: select  count(1) from LINEITEM where L_SUPPKEY=125019;
   Q2: select  count(1) from LINEITEM where L_SUPPKEY=125019;
   
   ![Query 
Performance(s)](https://user-images.githubusercontent.com/12771191/142428386-555e9cd1-8a65-44b4-b487-cf7c479f2c50.png)
   
   ![Percent of Filtered Data 
Pages](https://user-images.githubusercontent.com/12771191/142428653-83f66f3e-7ab5-4f6c-a455-5758c8693f5f.png)
   
   
   ### Limitation
   
   1. only duplicate model supports z-order
   2. short key index is not available for z-order
   3. only V2 storage format supports z-order
   
   ### Future work
   
   1. aggregate/unique model supports z-order
   2. try to use hilbert-curve to make data more clustering
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [ ] Bugfix (non-breaking change which fixes an issue)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to not work as expected)
   - [ ] Documentation Update (if none of the other choices apply)
   - [ ] Code refactor (Modify the code structure, format the code, etc...)
   - [ ] Optimization. Including functional usability improvements and 
performance improvements.
   - [ ] Dependency. Such as changes related to third-party components.
   - [ ] Other.
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after 
creating the PR. If you're unsure about any of them, don't hesitate to ask. 
We're here to help! This is simply a reminder of what we are going to look for 
before merging your code._
   
   - [ ] I have created an issue on (Fix #ISSUE) and described the bug/feature 
there in detail
   - [ ] Compiling and unit tests pass locally with my changes
   - [ ] I have added tests that prove my fix is effective or that my feature 
works
   - [ ] If these changes need document changes, I have updated the document
   - [ ] Any dependent changes have been merged
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[email protected] by explaining why you chose the solution you did and what 
alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to