shin chen created HIVE-21982:
--------------------------------

             Summary: hive does not use stats even after analyzing the table
                 Key: HIVE-21982
                 URL: https://issues.apache.org/jira/browse/HIVE-21982
             Project: Hive
          Issue Type: Bug
          Components: Hive
         Environment: HDP

Hive 1.2.1000.2.6.5.0-292
            Reporter: shin chen


 

setting:
{code:java}
hive.cbo.enable=true;
hive.compute.query.using.stats=true;
hive.stats.fetch.column.stats=true;
hive.stats.fetch.partition.stats=true;
hive.vectorized.execution.enabled =true;
hive.vectorized.execution.reduce.enabled = true;
{code}
{code:java}
// desc extended **.** partition(month=**,day=**,hour=**);
..... parameters:{transient_lastDdlTime=1561958282, totalSize=16413917810, 
numFiles=3}
{code}
This table is not analyzed yet, so scan the table when a simple query executed.
{code:java}
// code placeholder
SELECT count(*) FROM **.** WHERE month='**' AND day='**' AND hour='**';
.... 1 row selected (52.756 seconds){code}
After analyzing the table
{code:java}
// Analyze first
analyze table **.** partition(month='**',day='**',hour='**') compute statistics;
// Then runs the last count(*) query
SELECT count(*) FROM **.** WHERE month='**' AND day='**' AND hour='**';
.... 1 row selected (58.326 seconds){code}
Hive does not use the metadata in stats

Describe the table again:
{code:java}
....
parameters:{totalSize=16413917811, numRows=37975264, rawDataSize=4670957472, 
COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=3, 
transient_lastDdlTime=1562669873})
{code}
Any advice here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to