[
https://issues.apache.org/jira/browse/IMPALA-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938941#comment-16938941
]
ASF subversion and git services commented on IMPALA-8755:
---------------------------------------------------------
Commit 288c8c41b530e2a54850257c3c70d16328dbcf0b in impala's branch
refs/heads/master from norbert.luksa
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=288c8c4 ]
IMPALA-8755: Frontend support for Z-ordering
Extended the SQL grammar with an optional and a default flag for
SORT BY, namely ZORDER and LEXICAL. If set, the new 'sort.algorithm'
table property will be set to ZORDER and the information will sink
down to the backend. The default order is indicated by LEXICAL
and can be omitted. Examples are:
CREATE TABLE t (a INT, b INT) PARTITIONED BY (c INT)
SORT BY ZORDER (a, b);
CREATE TABLE t SORT BY ZORDER (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY ZORDER (id,zip);
ALTER TABLE t SORT BY ZORDER (int_col,id);
The following two are the same statements:
CREATE TABLE t (a INT, b INT) SORT BY (a, b);
CREATE TABLE t (a INT, b INT) SORT BY LEXICAL (a, b);
For strings, varchars, floats and doubles Z-ordering is currently
not supported. It's not suitable for strings and varchars, but
support can be added for floats and doubles later. The supported
types are: boolean, int types, decimals, date, timestamp, and char.
Currently ZORDER has the same functionality as a simple SORT BY clause,
therefore hidden behind a feature flag: unlock_zorder. The custom
sorting with Z-ordering will be in a different commit later.
Testing:
* Added tests for the ZORDER option for every SORT BY test.
* Modified some tests by adding the LEXICAL option.
* The .test workloads are temporarily put in separate test files
in order to set up the feature flag. These tests are run from
tests/custom_cluster/test_zorder.py which is a duplication of
the relevant tests, but with CustomClusterTestSuite decorator.
Change-Id: Ie122002ca8f52ca2c1e1ec8ff1d476ae1f4f875d
Reviewed-on: http://gerrit.cloudera.org:8080/13955
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Implement Z-ordering for Impala
> -------------------------------
>
> Key: IMPALA-8755
> URL: https://issues.apache.org/jira/browse/IMPALA-8755
> Project: IMPALA
> Issue Type: New Feature
> Reporter: Zoltán Borók-Nagy
> Assignee: Norbert Luksa
> Priority: Major
>
> Implement Z-ordering for Impala: [https://en.wikipedia.org/wiki/Z-order_curve]
> A Z-order curve defines an ordering on multi-dimensional data. Data sorted
> that way can be efficiently filtered by min/max statistics regarding to the
> columns participating in the ordering.
> Impala currently only supports lexicographic ordering via the SORT BY clause.
> This strongly prefers the first column, i.e. given the "SORT BY A, B, C"
> clause => A will be totally ordered (hence filtering on A will be very
> efficient), but values belonging to B and C will be scattered throughout the
> data set (hence filtering on B or C will barely do any good).
> We could add a new clause, e.g. a "ZSORT BY" clause to Impala that writes the
> data in Z-order.
> "ZSORT BY A, B C" would cluster the rows in a way that filtering on A, B, or
> C would be equally efficient.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]