[ 
https://issues.apache.org/jira/browse/IMPALA-14698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-14698:
------------------------------------
    Description: 
Users might need to update tblproperties of multiple partitions, e.g. 
explicitly setting numRows and STATS_GENERATED_VIA_STATS_TASK=true in an ETL 
pipeline:
{code:sql}
alter table my_tbl partition (p=1) set tblproperties('numRows'='1035', 
'STATS_GENERATED_VIA_STATS_TASK'='true');
alter table my_tbl partition (p=2) set tblproperties('numRows'='1030', 
'STATS_GENERATED_VIA_STATS_TASK'='true');
alter table my_tbl partition (p=3) set tblproperties('numRows'='1000', 
'STATS_GENERATED_VIA_STATS_TASK'='true');
{code}
Similar to IMPALA-14089, support doing this in a single statement helps to 
reduce the lock contention in catalogd side. Also the table version won't 
change frequently which avoids query planning retries (IMPALA-14695). Modifying 
multiple partitions in a single operation also improve the metadata reloading 
on them since they will be reloaded in parallel.

The statement can be
{code:sql}
alter table my_tbl
  partition (p=1) set tblproperties('numRows'='1035', 
'STATS_GENERATED_VIA_STATS_TASK'='true')
  partition (p=2) set tblproperties('numRows'='1030', 
'STATS_GENERATED_VIA_STATS_TASK'='true')
  partition (p=3) set tblproperties('numRows'='1000', 
'STATS_GENERATED_VIA_STATS_TASK'='true');{code}
Note that we already support multiple partitions if the operation is the same, 
e.g. setting the same tblproperties, e.g.

{code:sql}
alter table my_alltypes partition (year in (2009, 2010), month in (1,2,3)) set 
tblproperties('numRows'='10');{code}
What we are missing is setting different tblproperties for different partitions 
in a single command.

  was:
Users might need to update tblproperties of multiple partitions, e.g. 
explicitly setting numRows and STATS_GENERATED_VIA_STATS_TASK=true in an ETL 
pipeline:
{code:sql}
alter table my_tbl partition (p=1) set tblproperties('numRows'='1035', 
'STATS_GENERATED_VIA_STATS_TASK'='true');
alter table my_tbl partition (p=2) set tblproperties('numRows'='1030', 
'STATS_GENERATED_VIA_STATS_TASK'='true');
alter table my_tbl partition (p=3) set tblproperties('numRows'='1000', 
'STATS_GENERATED_VIA_STATS_TASK'='true');
{code}
Similar to IMPALA-14089, support doing this in a single statement helps to 
reduce the lock contention in catalogd side. Also the table version won't 
change frequently which avoids query planning retries (IMPALA-14695). Modifying 
multiple partitions in a single operation also improve the metadata reloading 
on them since they will be reloaded in parallel.


> Support mutiple partitions in ALTER TABLE statement
> ---------------------------------------------------
>
>                 Key: IMPALA-14698
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14698
>             Project: IMPALA
>          Issue Type: New Feature
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>
> Users might need to update tblproperties of multiple partitions, e.g. 
> explicitly setting numRows and STATS_GENERATED_VIA_STATS_TASK=true in an ETL 
> pipeline:
> {code:sql}
> alter table my_tbl partition (p=1) set tblproperties('numRows'='1035', 
> 'STATS_GENERATED_VIA_STATS_TASK'='true');
> alter table my_tbl partition (p=2) set tblproperties('numRows'='1030', 
> 'STATS_GENERATED_VIA_STATS_TASK'='true');
> alter table my_tbl partition (p=3) set tblproperties('numRows'='1000', 
> 'STATS_GENERATED_VIA_STATS_TASK'='true');
> {code}
> Similar to IMPALA-14089, support doing this in a single statement helps to 
> reduce the lock contention in catalogd side. Also the table version won't 
> change frequently which avoids query planning retries (IMPALA-14695). 
> Modifying multiple partitions in a single operation also improve the metadata 
> reloading on them since they will be reloaded in parallel.
> The statement can be
> {code:sql}
> alter table my_tbl
>   partition (p=1) set tblproperties('numRows'='1035', 
> 'STATS_GENERATED_VIA_STATS_TASK'='true')
>   partition (p=2) set tblproperties('numRows'='1030', 
> 'STATS_GENERATED_VIA_STATS_TASK'='true')
>   partition (p=3) set tblproperties('numRows'='1000', 
> 'STATS_GENERATED_VIA_STATS_TASK'='true');{code}
> Note that we already support multiple partitions if the operation is the 
> same, e.g. setting the same tblproperties, e.g.
> {code:sql}
> alter table my_alltypes partition (year in (2009, 2010), month in (1,2,3)) 
> set tblproperties('numRows'='10');{code}
> What we are missing is setting different tblproperties for different 
> partitions in a single command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to