[
https://issues.apache.org/jira/browse/IMPALA-14698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang updated IMPALA-14698:
------------------------------------
Description:
Users might need to update tblproperties of multiple partitions, e.g.
explicitly setting numRows and STATS_GENERATED_VIA_STATS_TASK=true in an ETL
pipeline:
{code:sql}
alter table my_tbl partition (p=1) set tblproperties('numRows'='1035',
'STATS_GENERATED_VIA_STATS_TASK'='true');
alter table my_tbl partition (p=2) set tblproperties('numRows'='1030',
'STATS_GENERATED_VIA_STATS_TASK'='true');
alter table my_tbl partition (p=3) set tblproperties('numRows'='1000',
'STATS_GENERATED_VIA_STATS_TASK'='true');
{code}
Similar to IMPALA-14089, support doing this in a single statement helps to
reduce the lock contention in catalogd side. Also the table version won't
change frequently which avoids query planning retries (IMPALA-14695). Modifying
multiple partitions in a single operation also improve the metadata reloading
on them since they will be reloaded in parallel.
The statement can be
{code:sql}
alter table my_tbl
partition (p=1) set tblproperties('numRows'='1035',
'STATS_GENERATED_VIA_STATS_TASK'='true')
partition (p=2) set tblproperties('numRows'='1030',
'STATS_GENERATED_VIA_STATS_TASK'='true')
partition (p=3) set tblproperties('numRows'='1000',
'STATS_GENERATED_VIA_STATS_TASK'='true');{code}
Note that we already support multiple partitions if the operation is the same,
e.g. setting the same tblproperties, e.g.
{code:sql}
alter table my_alltypes partition (year in (2009, 2010), month in (1,2,3)) set
tblproperties('numRows'='10');{code}
What we are missing is setting different tblproperties for different partitions
in a single command.
was:
Users might need to update tblproperties of multiple partitions, e.g.
explicitly setting numRows and STATS_GENERATED_VIA_STATS_TASK=true in an ETL
pipeline:
{code:sql}
alter table my_tbl partition (p=1) set tblproperties('numRows'='1035',
'STATS_GENERATED_VIA_STATS_TASK'='true');
alter table my_tbl partition (p=2) set tblproperties('numRows'='1030',
'STATS_GENERATED_VIA_STATS_TASK'='true');
alter table my_tbl partition (p=3) set tblproperties('numRows'='1000',
'STATS_GENERATED_VIA_STATS_TASK'='true');
{code}
Similar to IMPALA-14089, support doing this in a single statement helps to
reduce the lock contention in catalogd side. Also the table version won't
change frequently which avoids query planning retries (IMPALA-14695). Modifying
multiple partitions in a single operation also improve the metadata reloading
on them since they will be reloaded in parallel.
> Support mutiple partitions in ALTER TABLE statement
> ---------------------------------------------------
>
> Key: IMPALA-14698
> URL: https://issues.apache.org/jira/browse/IMPALA-14698
> Project: IMPALA
> Issue Type: New Feature
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
>
> Users might need to update tblproperties of multiple partitions, e.g.
> explicitly setting numRows and STATS_GENERATED_VIA_STATS_TASK=true in an ETL
> pipeline:
> {code:sql}
> alter table my_tbl partition (p=1) set tblproperties('numRows'='1035',
> 'STATS_GENERATED_VIA_STATS_TASK'='true');
> alter table my_tbl partition (p=2) set tblproperties('numRows'='1030',
> 'STATS_GENERATED_VIA_STATS_TASK'='true');
> alter table my_tbl partition (p=3) set tblproperties('numRows'='1000',
> 'STATS_GENERATED_VIA_STATS_TASK'='true');
> {code}
> Similar to IMPALA-14089, support doing this in a single statement helps to
> reduce the lock contention in catalogd side. Also the table version won't
> change frequently which avoids query planning retries (IMPALA-14695).
> Modifying multiple partitions in a single operation also improve the metadata
> reloading on them since they will be reloaded in parallel.
> The statement can be
> {code:sql}
> alter table my_tbl
> partition (p=1) set tblproperties('numRows'='1035',
> 'STATS_GENERATED_VIA_STATS_TASK'='true')
> partition (p=2) set tblproperties('numRows'='1030',
> 'STATS_GENERATED_VIA_STATS_TASK'='true')
> partition (p=3) set tblproperties('numRows'='1000',
> 'STATS_GENERATED_VIA_STATS_TASK'='true');{code}
> Note that we already support multiple partitions if the operation is the
> same, e.g. setting the same tblproperties, e.g.
> {code:sql}
> alter table my_alltypes partition (year in (2009, 2010), month in (1,2,3))
> set tblproperties('numRows'='10');{code}
> What we are missing is setting different tblproperties for different
> partitions in a single command.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]