Fwd: [jira] [Created] (IMPALA-6620) Compute incremental stats for groups of partitions does not update stats correctly

Jim Apple Sun, 01 Apr 2018 22:57:07 -0700

I feel like I saw a similar JIRA and patch recently. Is this addressed In
another ticket?

If not, it feels like a P2 to me: it’s not exactly incorrect, but I expect
it means that some calls to COMPUTE STATS would decrease query performance
in a very avoidable way.

---------- Forwarded message ---------
From: H Milyakov (JIRA) <[email protected]>
Date: Wed, Mar 7, 2018 at 4:57 AM
Subject: [jira] [Created] (IMPALA-6620) Compute incremental stats for
groups of partitions does not update stats correctly
To: <[email protected]>

H Milyakov created IMPALA-6620:
----------------------------------

             Summary: Compute incremental stats for groups of partitions
does not update stats correctly
                 Key: IMPALA-6620
                 URL: https://issues.apache.org/jira/browse/IMPALA-6620
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
    Affects Versions: Impala 2.8.0
         Environment: Impala - v2.8.0-cdh5.11.1
We are using Hive Metastore Database embedded (by cloudera)
It's postgres 8.4.20
OS: Centos
            Reporter: H Milyakov

Executing COMPUTE INCREMENTAL STATS `table` PARTITION (`partition clause`)
does not compute statistics correctly (computes 0) when `partition clause`
matches more than one partition.

Executing the same command when `partition clause` matches just a single
partition
results in statistics being computed correctly (non 0 and non -1).

The issue was observed on our production cluster for a table with 40 000
partitions and 20 columns.
I have copied the table to separate isolated cluster and observed the same
behaviour.
We use Impala 2.8.0 in Cloudera CDH 5.11

The issue could be simulated with the following:
 1. CREATE TABLE my_test_table ( some_ints BIGINT )
 PARTITIONED BY ( part_1 BIGINT, part_2 STRING )
 STORED AS PARQUET;

 2. The only column 'some_ints' is populated so that there are 10 000
different partitions (part_1, part_2).
 Total number of records in the table does not matter and could be same as
the number of different partitions.

 3. Then running the compute incremental as described above simulates the
issue.

Did anybody faced similar issue or does have more info on the case?

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Fwd: [jira] [Created] (IMPALA-6620) Compute incremental stats for groups of partitions does not update stats correctly

Reply via email to