Tony Hill created IMPALA-8331:
---------------------------------
Summary: Provide method for listing tables without statistics
Key: IMPALA-8331
URL: https://issues.apache.org/jira/browse/IMPALA-8331
Project: IMPALA
Issue Type: New Feature
Components: Catalog
Reporter: Tony Hill
Lack of statistics can cause performance issues with Impala making "wrong"
choices during the planning stage. It would be useful to provide a mechanism to
list tables with without statistics, or with corrupt statistics.
It can be done indirectly with queries on the metastore DB:
Tables without columns stats:
SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID
WHERE NOT EXISTS (SELECT * FROM TABLE_PARAMS WHERE
TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='COLUMN_STATS_ACCURATE' AND
PARAM_VALUE='true');
Tables without row stats
SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID
WHERE NOT EXISTS (SELECT * FROM TABLE_PARAMS WHERE
TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='numRows');
Impala shows messages about corrupt statistics where size >0 but row=0, i.e.
SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID
WHERE EXISTS (SELECT * FROM TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID
AND PARAM_KEY='numRows' AND PARAM_VALUE=0 ) AND EXISTS (SELECT * FROM
TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='totalSize'
AND PARAM_VALUE>0 );
(There are almost certainly more efficient queries !)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)