H Milyakov created IMPALA-6620:
Summary: Compute incremental stats for groups of partitions does
not update stats correctly
Issue Type: Bug
Affects Versions: Impala 2.8.0
Environment: Impala - v2.8.0-cdh5.11.1
We are using Hive Metastore Database embedded (by cloudera)
It's postgres 8.4.20
Reporter: H Milyakov
Executing COMPUTE INCREMENTAL STATS `table` PARTITION (`partition clause`)
does not compute statistics correctly (computes 0) when `partition clause`
matches more than one partition.
Executing the same command when `partition clause` matches just a single
results in statistics being computed correctly (non 0 and non -1).
The issue was observed on our production cluster for a table with 40 000
partitions and 20 columns.
I have copied the table to separate isolated cluster and observed the same
We use Impala 2.8.0 in Cloudera CDH 5.11
The issue could be simulated with the following:
1. CREATE TABLE my_test_table ( some_ints BIGINT )
PARTITIONED BY ( part_1 BIGINT, part_2 STRING )
STORED AS PARQUET;
2. The only column 'some_ints' is populated so that there are 10 000 different
partitions (part_1, part_2).
Total number of records in the table does not matter and could be same as the
number of different partitions.
3. Then running the compute incremental as described above simulates the issue.
Did anybody faced similar issue or does have more info on the case?
This message was sent by Atlassian JIRA