[
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213126#comment-17213126
]
logan.zheng edited comment on IMPALA-10230 at 10/13/20, 1:42 PM:
-----------------------------------------------------------------
reproduce this issue on asf impala 3.4
h3. 1. create table
create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY
(ds int) STORED AS PARQUET;
h3. 2 create data
insert overwrite table test_column_stats partition(ds=20200101)
select 'tt' str1 ,'20200101' as str2 ,1 as int1;
insert overwrite table test_column_stats partition(ds=20200103)
select 'tt2' str1 ,'20200103' as str2 ,1 as int1;
insert overwrite table test_column_stats partition(ds=20200104)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
insert overwrite table test_column_stats partition(ds=20200105)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
h3. 3 compute increment stats
compute incremental stats test_column_stats partition(ds=20200101);
compute incremental stats test_column_stats partition(ds=20200103);
compute incremental stats test_column_stats partition(ds=20200104);
h3. 4 update metastore
{code:java}
//代码占位符test_column_stats tab_id =92746
SELECT d.`NAME`,t.`TBL_NAME`,p.,pp. FROM `PARTITIONS` p,`TBLS` t,`DBS`
d,partition_params pp WHERE d.`NAME`='default' AND
t.`TBL_NAME`='test_column_stats' and p.PART_ID=pp.PART_ID and p.TBL_ID=92746
{code}
*
h5. PARAM_VALUE中序列化了TPartitionStats对象 关键点设置了num_nulls=-1
{code:java}
update PARTITION_PARAMS set
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcAAAAAAAAQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAAAAAAAAAFgAABHN0cjIYDP8A/wD/AAAAAAH9ABEWABUQFwAAAAAAACBAFgIAAA=='
where PARAM_KEY='impala_intermediate_stats_chunk0'
{code}
{code:java}
// Intermediate state for the computation of per-column stats. Impala can
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
// One byte for each bucket of the NDV HLL computation
1: optional binary intermediate_ndv
// If true, intermediate_ndv is RLE-compressed
2: optional bool is_ndv_encoded
// Number of nulls seen so far (or -1 if nulls are not counted)
3: optional i64 num_nulls
// The maximum width, in bytes, of the column
4: optional i32 max_width
// The average width (in bytes) of the column
5: optional double avg_width
// The number of rows counted, needed to compute NDVs from intermediate_ndv
6: optional i64 num_rows
}
// Per-partition statistics
struct TPartitionStats {
// Number of rows gathered per-partition by non-incremental stats.
// TODO: This can probably be removed in favour of the
intermediate_col_stats, but doing
// so would interfere with the non-incremental stats path
1: required TTableStats stats
// Intermediate state for incremental statistics, one entry per column name.
2: optional map<string, TIntermediateColumnStats> intermediate_col_stats
}
{code}
h3. 5. restart catalog and coordinator
clear then table partition cache
h3. 6. execute compute incremental stats
compute incremental stats test_column_stats partition(ds=20200105);
then will see exception
{code:java}
[localhost:21000] default> compute incremental stats test_column_stats
partition(ds=20200105);
Query: compute incremental stats test_column_stats partition(ds=20200107)
ERROR: TableLoadingException: Failed to load metadata for table:
default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0,
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5}
{code}
{code:java}
I1013 20:16:51.701009 1840603 HdfsTable.java:980] Reloading metadata for table
definition and all partition(s) of default.test_column_stats (ALTER TABLE
UPDATE_STATS)
I1013 20:16:51.851312 1840603 jni-util.cc:288]
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for
table: default.test_column_stats
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1032)
at
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:935)
at
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:848)
at
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:358)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:173)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=3.0,
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
at
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:454)
at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:287)
at org.apache.impala.catalog.Column.updateStats(Column.java:71)
at
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:159)
at org.apache.impala.catalog.Table.loadAllColumnStats(Table.java:376)
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:993)
... 4 more
I1013 20:16:51.851352 1834411 catalog-server.cc:737] Collected update:
1:CATALOG_SERVICE_ID, version=312, original size=60, compressed size=58
I1013 20:16:51.851892 1840603 status.cc:126] TableLoadingException: Failed to
load metadata for table: default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0,
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
@ 0xbf4ef9
@ 0x12e276e
@ 0xbdb0a7
@ 0xbc86b9
@ 0xce13ec
@ 0xcdf86c
@ 0xbb8f49
@ 0x1029af5
@ 0x101d545
@ 0x137488a
@ 0x1375759
@ 0x1b48a19
@ 0x7f814c55be24
@ 0x7f814915335c
E1013 20:16:51.851924 1840603 catalog-server.cc:114] TableLoadingException:
Failed to load metadata for table: default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0,
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
{code}
was (Author: loganzheng):
reproduce this issue on asf impala 3.4
h3. 1. create table
create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY
(ds int) STORED AS PARQUET;
h3. 2 create data
insert overwrite table test_column_stats partition(ds=20200101)
select 'tt' str1 ,'20200101' as str2 ,1 as int1;
insert overwrite table test_column_stats partition(ds=20200103)
select 'tt2' str1 ,'20200103' as str2 ,1 as int1;
insert overwrite table test_column_stats partition(ds=20200104)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
insert overwrite table test_column_stats partition(ds=20200105)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
h3. 3 compute increment stats
compute incremental stats test_column_stats partition(ds=20200101);
compute incremental stats test_column_stats partition(ds=20200103);
compute incremental stats test_column_stats partition(ds=20200104);
h3. 4 update metastore
{code:java}
//代码占位符test_column_stats tab_id =92746
SELECT d.`NAME`,t.`TBL_NAME`,p.,pp. FROM `PARTITIONS` p,`TBLS` t,`DBS`
d,partition_params pp WHERE d.`NAME`='default' AND
t.`TBL_NAME`='test_column_stats' and p.PART_ID=pp.PART_ID and p.TBL_ID=92746
{code}
*
h5.
h5. PARAM_VALUE中序列化了TPartitionStats对象 关键点设置了num_nulls=-1
{code:java}
update PARTITION_PARAMS set
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcAAAAAAAAQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAAAAAAAAAFgAABHN0cjIYDP8A/wD/AAAAAAH9ABEWABUQFwAAAAAAACBAFgIAAA=='
where PARAM_KEY='impala_intermediate_stats_chunk0'
{code}
{code:java}
// Intermediate state for the computation of per-column stats. Impala can
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
// One byte for each bucket of the NDV HLL computation
1: optional binary intermediate_ndv
// If true, intermediate_ndv is RLE-compressed
2: optional bool is_ndv_encoded
// Number of nulls seen so far (or -1 if nulls are not counted)
3: optional i64 num_nulls
// The maximum width, in bytes, of the column
4: optional i32 max_width
// The average width (in bytes) of the column
5: optional double avg_width
// The number of rows counted, needed to compute NDVs from intermediate_ndv
6: optional i64 num_rows
}
// Per-partition statistics
struct TPartitionStats {
// Number of rows gathered per-partition by non-incremental stats.
// TODO: This can probably be removed in favour of the
intermediate_col_stats, but doing
// so would interfere with the non-incremental stats path
1: required TTableStats stats
// Intermediate state for incremental statistics, one entry per column name.
2: optional map<string, TIntermediateColumnStats> intermediate_col_stats
}
{code}
h3. 5. restart catalog and coordinator
clear then table partition cache
h3. 6. execute compute incremental stats
compute incremental stats test_column_stats partition(ds=20200105);
then will see exception
{code:java}
[localhost:21000] default> compute incremental stats test_column_stats
partition(ds=20200105);
Query: compute incremental stats test_column_stats partition(ds=20200107)
ERROR: TableLoadingException: Failed to load metadata for table:
default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0,
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5}
{code}
{code:java}
I1013 20:16:51.701009 1840603 HdfsTable.java:980] Reloading metadata for table
definition and all partition(s) of default.test_column_stats (ALTER TABLE
UPDATE_STATS)
I1013 20:16:51.851312 1840603 jni-util.cc:288]
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for
table: default.test_column_stats
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1032)
at
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:935)
at
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:848)
at
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:358)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:173)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=3.0,
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
at
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:454)
at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:287)
at org.apache.impala.catalog.Column.updateStats(Column.java:71)
at
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:159)
at org.apache.impala.catalog.Table.loadAllColumnStats(Table.java:376)
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:993)
... 4 more
I1013 20:16:51.851352 1834411 catalog-server.cc:737] Collected update:
1:CATALOG_SERVICE_ID, version=312, original size=60, compressed size=58
I1013 20:16:51.851892 1840603 status.cc:126] TableLoadingException: Failed to
load metadata for table: default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0,
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
@ 0xbf4ef9
@ 0x12e276e
@ 0xbdb0a7
@ 0xbc86b9
@ 0xce13ec
@ 0xcdf86c
@ 0xbb8f49
@ 0x1029af5
@ 0x101d545
@ 0x137488a
@ 0x1375759
@ 0x1b48a19
@ 0x7f814c55be24
@ 0x7f814915335c
E1013 20:16:51.851924 1840603 catalog-server.cc:114] TableLoadingException:
Failed to load metadata for table: default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0,
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
{code}
> column stats num_nulls less than -1
> -----------------------------------
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 3.4.0
> Reporter: logan zheng
> Priority: Critical
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats
> default.test partition(xx=yyyy)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0,
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running
> for a long time, and has also been added stats.
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]