Amareshwari Sriramadasu created HIVE-3962:
---------------------------------------------

             Summary: number of distinct values are in column statistics
                 Key: HIVE-3962
                 URL: https://issues.apache.org/jira/browse/HIVE-3962
             Project: Hive
          Issue Type: Bug
          Components: Statistics
    Affects Versions: 0.10.0
            Reporter: Amareshwari Sriramadasu


When we run the query on hive ql src table :

select count(distinct(key)), count(distinct(value) from src;
309 309

After running the following analyze query, the stats in metastore seem wrong:

analyze table src compute statistics for columns key, value; 

--- stats in metastore ---

mysql > select * from TAB_COL_STATS where TABLE_NAME="src";

| CS_ID | DB_NAME | TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | TBL_ID | 
LONG_LOW_VALUE | LONG_HIGH_VALUE | DOUBLE_HIGH_VALUE | DOUBLE_LOW_VALUE | 
BIG_DECIMAL_LOW_VALUE | BIG_DECIMAL_HIGH_VALUE | NUM_NULLS | NUM_DISTINCTS | 
AVG_COL_LEN | MAX_COL_LEN | NUM_TRUES | NUM_FALSES | LAST_ANALYZED |
|     5 | default | src        | key         | int         |     11 |           
   0 |             498 |            0.0000 |           0.0000 | NULL            
      | NULL                   |         0 |           291 |      0.0000 |      
     0 |         0 |          0 |    1359539181 |
|     6 | default | src        | value       | string      |     11 |           
   0 |               0 |            0.0000 |           0.0000 | NULL            
      | NULL                   |         0 |           112 |      6.8120 |      
     7 |         0 |          0 |    1359539181 |



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to