[
https://issues.apache.org/jira/browse/IMPALA-12981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852871#comment-17852871
]
Riza Suminto commented on IMPALA-12981:
---------------------------------------
Compute stats over just subset of columns currently relies on getting the
column name from the SQL syntax of COMPUTE STATS
[https://github.com/apache/impala/blob/753ee9b8a80d8e4c0db966a3132446a5aceb05cd/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L189-L191]
Supporting this feature will require running the subquery first to retrieve
list of column names before running the rest of COMPUTE STATS child queries.
> Support a column list in compute stats that is retrieved via a subquery
> -------------------------------------------------------------------------
>
> Key: IMPALA-12981
> URL: https://issues.apache.org/jira/browse/IMPALA-12981
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend, Frontend
> Reporter: Manish Maheshwari
> Priority: Major
>
> Support a column list in compute stats that is retrived via a subquery -
> Specifically we want to use Impala query history tables where we collect the
> columns in a table that are using for joins, aggegrates, filters etc to be
> passed into compute stats command.
> Ideally the way that we would want it to work is that generate a table from
> the query history table that has the most frequent table and most frequent
> columns accessed and then feed them into the compute stats command.
> Suggested Syntax -
> {code:java}
> Table Level -
> compute stats db.tbl (
> select distinct join_columns from
> from sys.impala_query_log
> where contains(tables_queried, "db.tbl")
> and query_dttm >current_timestamp()-7
> and join_columns rlike 'db.tbl'
> )
> Across Tables -
> compute stats on (select tables, columns from sys.impala_query_log where
> query_dttm > current_timestamp()-7 group tables, columns by order by tables,
> columns, count(1) desc having count(1) > 1000 )
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]