[
https://issues.apache.org/jira/browse/IMPALA-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Tauber-Marshall resolved IMPALA-6954.
--------------------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.13.0
> Kudu CTAS Loses Partitioning
> ----------------------------
>
> Key: IMPALA-6954
> URL: https://issues.apache.org/jira/browse/IMPALA-6954
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: Alan Jackoway
> Assignee: Thomas Tauber-Marshall
> Priority: Critical
> Fix For: Impala 2.13.0
>
>
> In certain types of queries, CTAS stored as Kudu will lose the partitioning.
> To reproduce:
> Create transactions table:
> {code:sql}
> create table alanj_transactions(account_id string, transaction_id string,
> total double, close_date string)
> {code}
> Don't need to put any data into it. Create Kudu table from it, trying to get
> the longest-lived record (close date to now):
> {code:sql}
> create table alanj_kudu
> primary key (account_id)
> partition by hash(account_id) partitions 5
> stored as kudu
> as
> select account_id,
> datediff(now(), min(cast(close_date AS TIMESTAMP))) AS tenure_days
> from alanj_transactions
> group by 1
> {code}
> You receive a warning like "Unpartitioned Kudu tables are inefficient for
> large data sizes." Show create table + the Kudu UIs confirm that partitions
> were not created.
> If you replace that datediff line with something like {{sum(total) as
> account_total}}, it works fine. Something about datediff is causing it to
> lose the partitioning.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)