[ 
https://issues.apache.org/jira/browse/IMPALA-12981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852871#comment-17852871
 ] 

Riza Suminto commented on IMPALA-12981:
---------------------------------------

Compute stats over just subset of columns currently relies on getting the 
column name from the SQL syntax of COMPUTE STATS

[https://github.com/apache/impala/blob/753ee9b8a80d8e4c0db966a3132446a5aceb05cd/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L189-L191]

Supporting this feature will require running the subquery first to retrieve 
list of column names before running the rest of COMPUTE STATS child queries.

> Support a column list in compute stats that is retrieved via a subquery  
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-12981
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12981
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Frontend
>            Reporter: Manish Maheshwari
>            Priority: Major
>
> Support a column list in compute stats that is retrived via a subquery - 
> Specifically we want to use Impala query history tables where we collect the 
> columns in a table that are using for joins, aggegrates, filters etc to be 
> passed into compute stats command.
> Ideally the way that we would want it to work is that generate a table from 
> the query history table that has the most frequent table and most frequent 
> columns accessed  and then feed them into the compute stats command. 
> Suggested Syntax - 
> {code:java}
> Table Level - 
> compute stats db.tbl (
> select distinct join_columns from
> from sys.impala_query_log
> where contains(tables_queried, "db.tbl")
> and query_dttm >current_timestamp()-7
> and join_columns rlike 'db.tbl'
> ) 
> Across Tables - 
> compute stats on (select tables, columns from sys.impala_query_log where 
> query_dttm > current_timestamp()-7 group tables, columns by order by tables, 
> columns, count(1) desc having count(1) > 1000  )
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to