[jira] [Commented] (IMPALA-12737) Include List of Referenced Columns in Query Log Table

ASF subversion and git services (Jira) Wed, 28 Aug 2024 15:20:03 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-12737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877546#comment-17877546
 ]


ASF subversion and git services commented on IMPALA-12737:
----------------------------------------------------------

Commit 77a87bb103362ebafb0624f95d1a413417763d66 in impala's branch 
refs/heads/master from jasonmfehr
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=77a87bb10 ]

IMPALA-12737: Refactor the Workload Management Initialization Process.

The workload management initialization process creates the two tables
"sys.impala_query_log" and "sys.impala_query_live" during coordinator
startup.

The current design for this init process is to create both tables on
each coordinator at every startup by running create database and
create table if not exists DDLs. This design causes unnecessary DDLs
to execute which delays coordinator startup and introduces the
potential for unnecessary startup failures should the DDLs fail.

This patch splits the initialization code into its own file and adds
version tracking to the individual fields in the workload management
tables. This patch also adds schema version checks on the workload
management tables and only runs DDLs for the db tables if necessary.

Additionally, versioning of workload management table schemas is
introduced. The only allowed schema version in this patch is 1.0.0.
Future patches that need to modify the workload management table
schema will expand this list of allowed versions.

Since this patch is a refactor and does not change functionality,
testing was accomplished by running existing workload management
unit and python tests.

Change-Id: Id645f94c8da73b91c13a23d7ac0ea026425f0f96
Reviewed-on: http://gerrit.cloudera.org:8080/21653
Reviewed-by: Riza Suminto <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Include List of Referenced Columns in Query Log Table
> -----------------------------------------------------
>
>                 Key: IMPALA-12737
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12737
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Manish Maheshwari
>            Assignee: Jason Fehr
>            Priority: Critical
>              Labels: workload-management
>
> In the Impala query log table where completed queries are stored, add lists 
> of columns that were referenced in the query. The purpose behind this 
> functionality is to know which columns are part of 
>  * Select clause
>  * Where clause
>  * Join clause
>  * Aggegrate clause
>  * Order by clause
> There should be a column for each type of clause, so that decisions can be 
> made based on specific usage or on the union of those clauses.
> With this information, we will feed into compute stats command to collect 
> stats only on the required columns that are using in joins / filters and 
> aggegrates and not on all the table columns.
> The information can be collected as an array of 
> [db1.table1.column1,db1.table1.column2]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-12737) Include List of Referenced Columns in Query Log Table

Reply via email to