It's addressed in IMPALA-5615. On 2018. Apr 2., Mon at 7:56, Jim Apple <[email protected]> wrote:
> I feel like I saw a similar JIRA and patch recently. Is this addressed In > another ticket? > > If not, it feels like a P2 to me: it’s not exactly incorrect, but I expect > it means that some calls to COMPUTE STATS would decrease query performance > in a very avoidable way. > > ---------- Forwarded message --------- > From: H Milyakov (JIRA) <[email protected]> > Date: Wed, Mar 7, 2018 at 4:57 AM > Subject: [jira] [Created] (IMPALA-6620) Compute incremental stats for > groups of partitions does not update stats correctly > To: <[email protected]> > > > H Milyakov created IMPALA-6620: > ---------------------------------- > > Summary: Compute incremental stats for groups of partitions > does not update stats correctly > Key: IMPALA-6620 > URL: https://issues.apache.org/jira/browse/IMPALA-6620 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Affects Versions: Impala 2.8.0 > Environment: Impala - v2.8.0-cdh5.11.1 > We are using Hive Metastore Database embedded (by cloudera) > It's postgres 8.4.20 > OS: Centos > Reporter: H Milyakov > > > Executing COMPUTE INCREMENTAL STATS `table` PARTITION (`partition clause`) > does not compute statistics correctly (computes 0) when `partition clause` > matches more than one partition. > > Executing the same command when `partition clause` matches just a single > partition > results in statistics being computed correctly (non 0 and non -1). > > The issue was observed on our production cluster for a table with 40 000 > partitions and 20 columns. > I have copied the table to separate isolated cluster and observed the same > behaviour. > We use Impala 2.8.0 in Cloudera CDH 5.11 > > The issue could be simulated with the following: > 1. CREATE TABLE my_test_table ( some_ints BIGINT ) > PARTITIONED BY ( part_1 BIGINT, part_2 STRING ) > STORED AS PARQUET; > > 2. The only column 'some_ints' is populated so that there are 10 000 > different partitions (part_1, part_2). > Total number of records in the table does not matter and could be same as > the number of different partitions. > > 3. Then running the compute incremental as described above simulates the > issue. > > > Did anybody faced similar issue or does have more info on the case? > > > > -- > This message was sent by Atlassian JIRA > (v7.6.3#76005) >
