ulysses-you opened a new issue #934:
URL: https://github.com/apache/incubator-kyuubi/issues/934


   # Describe the proposal
   <!--
   A clear and concise description of what the proposal is.
   If this is a KPIP https://kyuubi.apache.org/improvement-proposals.html, 
please put related link here.
   -->
   As we know, [Z-order](https://en.wikipedia.org/wiki/Z-order_curve) has 
benefits of data skipping which support map multidimensional data to one 
dimension. Besides, Z-order provides a good compression ratio for column-based 
storage.
   
   The additional cost of Z-order is that we need do a special "order" included 
an extra shuffle before data writing.
   
   Given this, Kyuubi want to find a better way to do the optimization using 
Z-order. In short, the basic question is: 
   * how to choose a table and columns to optimize using Z-order
   * how to confirm the optimized table is effective
   
   **For question 1:**
   We can analyze the metrics to get the relationship between queries. Then 
choose a hot table and it's predicate distribution is concentrated.
   
   **For question 2:**
   Also analyze the metrics to see the queries which scan on the optimized 
table have benefits or not. We can rollback if perf has regression.
   
   # Task list
   <!--
   Several sub-tasks with the pre-create issues, and it's better to @ the 
assignees if you know.
   More details can see github docs 
https://docs.github.com/en/issues/tracking-your-work-with-issues/about-task-lists.
   A simple example:
   
   - [ ] #1
     - [ ] #11 @user1
     - [ ] #12
     - [ ] #13
   - [ ] #2 @user2
   - [ ] #3
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to