Tony Hill created IMPALA-8331:
---------------------------------

             Summary: Provide method for listing tables without statistics
                 Key: IMPALA-8331
                 URL: https://issues.apache.org/jira/browse/IMPALA-8331
             Project: IMPALA
          Issue Type: New Feature
          Components: Catalog
            Reporter: Tony Hill


Lack of statistics can cause performance issues with Impala making "wrong" 
choices during the planning stage. It would be useful to provide a mechanism to 
list tables with without statistics, or with corrupt statistics.

It can be done indirectly with queries on the metastore DB:

Tables without columns stats:
SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID 
WHERE NOT EXISTS (SELECT * FROM TABLE_PARAMS WHERE 
TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='COLUMN_STATS_ACCURATE' AND 
PARAM_VALUE='true');

Tables without row stats
SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID 
WHERE NOT EXISTS (SELECT * FROM TABLE_PARAMS WHERE 
TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='numRows');

 

Impala shows messages about corrupt statistics where size >0 but row=0, i.e.

 

SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID 
WHERE EXISTS (SELECT * FROM TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID 
AND PARAM_KEY='numRows' AND PARAM_VALUE=0 ) AND EXISTS (SELECT * FROM 
TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='totalSize' 
AND PARAM_VALUE>0 );

 

(There are almost certainly more efficient queries !)

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to