H Milyakov created IMPALA-6620:
----------------------------------

             Summary: Compute incremental stats for groups of partitions does 
not update stats correctly
                 Key: IMPALA-6620
                 URL: https://issues.apache.org/jira/browse/IMPALA-6620
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
    Affects Versions: Impala 2.8.0
         Environment: Impala - v2.8.0-cdh5.11.1 
We are using Hive Metastore Database embedded (by cloudera) 
It's postgres 8.4.20 
OS: Centos 
            Reporter: H Milyakov


Executing COMPUTE INCREMENTAL STATS `table` PARTITION (`partition clause`) 
does not compute statistics correctly (computes 0) when `partition clause` 
matches more than one partition.

Executing the same command when `partition clause` matches just a single 
partition 
results in statistics being computed correctly (non 0 and non -1).

The issue was observed on our production cluster for a table with 40 000 
partitions and 20 columns.
I have copied the table to separate isolated cluster and observed the same 
behaviour.
We use Impala 2.8.0 in Cloudera CDH 5.11

The issue could be simulated with the following:
 1. CREATE TABLE my_test_table ( some_ints BIGINT )
 PARTITIONED BY ( part_1 BIGINT, part_2 STRING ) 
 STORED AS PARQUET;
 
 2. The only column 'some_ints' is populated so that there are 10 000 different 
partitions (part_1, part_2).
 Total number of records in the table does not matter and could be same as the 
number of different partitions.
 
 3. Then running the compute incremental as described above simulates the issue.


Did anybody faced similar issue or does have more info on the case?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to