[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

logan.zheng updated IMPALA-10230:
---------------------------------
    Comment: was deleted

(was: reproduce this issue
impala 3.3+

1 create table
create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;

2 create data
insert overwrite table test_column_stats partition(ds=20200101)
select 'tt' str1 ,'20200101' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200103)
select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

3 compute increment stats
compute incremental stats test_column_stats partition(ds=20200101);
compute incremental stats test_column_stats partition(ds=20200103);
compute incremental stats test_column_stats partition(ds=20200104);

4 update metastore
SELECT d.`NAME`,t.`TBL_NAME`,p.*,pp.*
FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE  d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID  
 and p.TBL_ID=92746
update PARTITION_PARAMS
set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcAAAAAAAAQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAAAAAAAAAFgAABHN0cjIYDP8A/wD/AAAAAAH9ABEWABUQFwAAAAAAACBAFgIAAA=='
where  PARAM_KEY='impala_intermediate_stats_chunk0'
PARAM_VALUE中序列化了TPartitionStats对象 关键点num_nulls=-1
// Intermediate state for the computation of per-column stats. Impala can 
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
  // One byte for each bucket of the NDV HLL computation
  1: optional binary intermediate_ndv

  // If true, intermediate_ndv is RLE-compressed
  2: optional bool is_ndv_encoded

  // Number of nulls seen so far (or -1 if nulls are not counted)
  3: optional i64 num_nulls

  // The maximum width, in bytes, of the column
  4: optional i32 max_width

  // The average width (in bytes) of the column
  5: optional double avg_width

  // The number of rows counted, needed to compute NDVs from intermediate_ndv
  6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats {
  // Number of rows gathered per-partition by non-incremental stats.
  // TODO: This can probably be removed in favour of the 
intermediate_col_stats, but doing
  // so would interfere with the non-incremental stats path
  1: required TTableStats stats

  // Intermediate state for incremental statistics, one entry per column name.
  2: optional map<string, TIntermediateColumnStats> intermediate_col_stats
}
5. restart catalog and coordinator
clear then table partition cache

6. execute compute incremental stats
compute incremental stats test_column_stats partition(ds=20200105);
then will see exception

[localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
Query: compute incremental stats test_column_stats partition(ds=20200107)
ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5})

> column stats num_nulls less than -1
> -----------------------------------
>
>                 Key: IMPALA-10230
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10230
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 3.4.0
>            Reporter: logan zheng
>            Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=yyyy)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to