Zoltán Borók-Nagy created IMPALA-8755:
-----------------------------------------

             Summary: Implement Z-ordering for Impala
                 Key: IMPALA-8755
                 URL: https://issues.apache.org/jira/browse/IMPALA-8755
             Project: IMPALA
          Issue Type: New Feature
            Reporter: Zoltán Borók-Nagy


Implement Z-ordering for Impala: [https://en.wikipedia.org/wiki/Z-order_curve]

A Z-order curve defines an ordering on multi-dimensional data. Data sorted that 
way can be efficiently filtered by min/max statistics regarding to the columns 
participating in the ordering.

Impala currently only supports lexicographic ordering via the SORT BY clause. 
This strongly prefers the first column, i.e. given the "SORT BY A, B, C" clause 
=> A will be totally ordered (hence filtering on A will be very efficient), but 
values belonging to B and C will be scattered throughout the data set (hence 
filtering on B or C will barely do any good).

We could add a new clause, e.g. a "ZSORT BY" clause to Impala that writes the 
data in Z-order.

"ZSORT BY A, B C" would cluster the rows in a way that filtering on A, B, or C 
would be equally efficient.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to