[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213126#comment-17213126
 ] 

logan.zheng edited comment on IMPALA-10230 at 10/13/20, 1:35 PM:
-----------------------------------------------------------------

h2. reproduce this issue

asf  impala 3.4
h3. 1. create table

create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
h3. 2 create data

insert overwrite table test_column_stats partition(ds=20200101)
 select 'tt' str1 ,'20200101' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200103)
 select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
h3. 3 compute increment stats

compute incremental stats test_column_stats partition(ds=20200101);
 compute incremental stats test_column_stats partition(ds=20200103);
 compute incremental stats test_column_stats partition(ds=20200104);
h3. 4 update metastore
 # test_column_stats tab_id =92746
 SELECT d.`NAME`,t.`TBL_NAME`,p.*,pp.*
 FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID 
 and p.TBL_ID=92746 
  
{code:java}
 
update PARTITION_PARAMS set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcAAAAAAAAQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAAAAAAAAAFgAABHN0cjIYDP8A/wD/AAAAAAH9ABEWABUQFwAAAAAAACBAFgIAAA=='
 where PARAM_KEY='impala_intermediate_stats_chunk0'
 
{code}
h5. PARAM_VALUE中序列化了TPartitionStats对象 关键点设置了num_nulls=-1

 // Intermediate state for the computation of per-column stats. Impala can 
aggregate these
 // structures together to produce final stats for a column.
 struct TIntermediateColumnStats {
 // One byte for each bucket of the NDV HLL computation
 1: optional binary intermediate_ndv

// If true, intermediate_ndv is RLE-compressed
 2: optional bool is_ndv_encoded

// Number of nulls seen so far (or -1 if nulls are not counted)
 3: optional i64 num_nulls

// The maximum width, in bytes, of the column
 4: optional i32 max_width

// The average width (in bytes) of the column
 5: optional double avg_width

// The number of rows counted, needed to compute NDVs from intermediate_ndv
 6: optional i64 num_rows
 }

// Per-partition statistics
 struct TPartitionStats {
 // Number of rows gathered per-partition by non-incremental stats.
 // TODO: This can probably be removed in favour of the intermediate_col_stats, 
but doing
 // so would interfere with the non-incremental stats path
 1: required TTableStats stats

// Intermediate state for incremental statistics, one entry per column name.
 2: optional map<string, TIntermediateColumnStats> intermediate_col_stats
 }
  
h3. 5. restart catalog and coordinator

clear then table partition cache
h3. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#6-execute-compute-incremental-stats]6.
 execute compute incremental stats

compute incremental stats test_column_stats partition(ds=20200105);
 then will see exception
 [localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
 Query: compute incremental stats test_column_stats partition(ds=20200107)
 ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
 CAUSED BY: IllegalStateException: ColumnStats\{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5} 
{code:java}
//代码占位符
I1013 20:16:51.701009 1840603 HdfsTable.java:980] Reloading metadata for table 
definition and all partition(s) of default.test_column_stats (ALTER TABLE 
UPDATE_STATS)
I1013 20:16:51.851312 1840603 jni-util.cc:288] 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: default.test_column_stats
        at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1032)
        at 
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:935)
        at 
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:848)
        at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:358)
        at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:173)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
        at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
        at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:454)
        at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:287)
        at org.apache.impala.catalog.Column.updateStats(Column.java:71)
        at 
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:159)
        at org.apache.impala.catalog.Table.loadAllColumnStats(Table.java:376)
        at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:993)
        ... 4 more
I1013 20:16:51.851352 1834411 catalog-server.cc:737] Collected update: 
1:CATALOG_SERVICE_ID, version=312, original size=60, compressed size=58
I1013 20:16:51.851892 1840603 status.cc:126] TableLoadingException: Failed to 
load metadata for table: default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
    @           0xbf4ef9
    @          0x12e276e
    @           0xbdb0a7
    @           0xbc86b9
    @           0xce13ec
    @           0xcdf86c
    @           0xbb8f49
    @          0x1029af5
    @          0x101d545
    @          0x137488a
    @          0x1375759
    @          0x1b48a19
    @     0x7f814c55be24
    @     0x7f814915335c
E1013 20:16:51.851924 1840603 catalog-server.cc:114] TableLoadingException: 
Failed to load metadata for table: default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
{code}
 


was (Author: loganzheng):
h2. reproduce this issue

asf  impala 3.4
h3. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#1-create-table]1
 create table

create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
h3. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#2-create-data]2
 create data

insert overwrite table test_column_stats partition(ds=20200101)
select 'tt' str1 ,'20200101' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200103)
select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
h3. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#3-compute-increment-stats]3
 compute increment stats

compute incremental stats test_column_stats partition(ds=20200101);
compute incremental stats test_column_stats partition(ds=20200103);
compute incremental stats test_column_stats partition(ds=20200104);
h3. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#4-update-metastore]4
 update metastore
# test_column_stats tab_id =92746
SELECT d.`NAME`,t.`TBL_NAME`,p.*,pp.*
FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE  d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID  
 and p.TBL_ID=92746 
 
{code:java}
 
update PARTITION_PARAMS set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcAAAAAAAAQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAAAAAAAAAFgAABHN0cjIYDP8A/wD/AAAAAAH9ABEWABUQFwAAAAAAACBAFgIAAA=='
 where PARAM_KEY='impala_intermediate_stats_chunk0'
 
{code}
h5. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#param_value%E4%B8%AD%E5%BA%8F%E5%88%97%E5%8C%96%E4%BA%86tpartitionstats%E5%AF%B9%E8%B1%A1-%E5%85%B3%E9%94%AE%E7%82%B9num_nulls-1]PARAM_VALUE中序列化了TPartitionStats对象
 关键点设置了num_nulls=-1
 // Intermediate state for the computation of per-column stats. Impala can 
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats \{
  // One byte for each bucket of the NDV HLL computation
  1: optional binary intermediate_ndv

  // If true, intermediate_ndv is RLE-compressed
  2: optional bool is_ndv_encoded

  // Number of nulls seen so far (or -1 if nulls are not counted)
  3: optional i64 num_nulls

  // The maximum width, in bytes, of the column
  4: optional i32 max_width

  // The average width (in bytes) of the column
  5: optional double avg_width

  // The number of rows counted, needed to compute NDVs from intermediate_ndv
  6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats \{
  // Number of rows gathered per-partition by non-incremental stats.
  // TODO: This can probably be removed in favour of the 
intermediate_col_stats, but doing
  // so would interfere with the non-incremental stats path
  1: required TTableStats stats

  // Intermediate state for incremental statistics, one entry per column name.
  2: optional map<string, TIntermediateColumnStats> intermediate_col_stats
}
 
h3. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#5-restart-catalog-and-coordinator]5.
 restart catalog and coordinator

clear then table partition cache
h3. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#6-execute-compute-incremental-stats]6.
 execute compute incremental stats

compute incremental stats test_column_stats partition(ds=20200105);
then will see exception
[localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
Query: compute incremental stats test_column_stats partition(ds=20200107)
ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats\{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5} 
{code:java}
//代码占位符
I1013 20:16:51.701009 1840603 HdfsTable.java:980] Reloading metadata for table 
definition and all partition(s) of default.test_column_stats (ALTER TABLE 
UPDATE_STATS)
I1013 20:16:51.851312 1840603 jni-util.cc:288] 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: default.test_column_stats
        at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1032)
        at 
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:935)
        at 
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:848)
        at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:358)
        at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:173)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
        at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
        at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:454)
        at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:287)
        at org.apache.impala.catalog.Column.updateStats(Column.java:71)
        at 
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:159)
        at org.apache.impala.catalog.Table.loadAllColumnStats(Table.java:376)
        at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:993)
        ... 4 more
I1013 20:16:51.851352 1834411 catalog-server.cc:737] Collected update: 
1:CATALOG_SERVICE_ID, version=312, original size=60, compressed size=58
I1013 20:16:51.851892 1840603 status.cc:126] TableLoadingException: Failed to 
load metadata for table: default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
    @           0xbf4ef9
    @          0x12e276e
    @           0xbdb0a7
    @           0xbc86b9
    @           0xce13ec
    @           0xcdf86c
    @           0xbb8f49
    @          0x1029af5
    @          0x101d545
    @          0x137488a
    @          0x1375759
    @          0x1b48a19
    @     0x7f814c55be24
    @     0x7f814915335c
E1013 20:16:51.851924 1840603 catalog-server.cc:114] TableLoadingException: 
Failed to load metadata for table: default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
{code}

> column stats num_nulls less than -1
> -----------------------------------
>
>                 Key: IMPALA-10230
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10230
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 3.4.0
>            Reporter: logan zheng
>            Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=yyyy)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to