[
https://issues.apache.org/jira/browse/IMPALA-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fang-Yu Rao updated IMPALA-11666:
---------------------------------
Description:
Currently, '{{{}hasCorruptTableStats_{}}}' of an HDFS table is set to true when
one of the following is true in
[HdfsScanNode.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java].
# Its '{{{}cardinality_{}}}' less than -1.
# The number of rows in one of its partition is less than -1.
# The number of rows in one of its partition is 0 but the size of the
associated files of this partition is greater than 0.
# The number of rows in the table is 0 but the size of the associated files of
this table is greater than 0.
For such a table, the {{EXPLAIN}} statement for queries involving the table
would contain the message of "{{{}WARNING: The following tables have
potentially corrupt table statistics. Drop and re-compute statistics to resolve
this problem.{}}}"
The warning message may be a bit too scary for an Impala user especially if we
consider the fact that a table without corrupt statistics could indeed have its
'{{{}hasCorruptTableStats_{}}}' set to true by Impala's frontend.
Specifically, a table without corrupt statistics but having its
'{{{}hasCorruptTableStats_{}}}' set to 1 could be created as follows after
starting the Impala cluster.
# Execute on the command line "{{{}beeline -u
"jdbc:hive2://localhost:11050/default"{}}}" to enter beeline.
# Create a transactional table in beeline via "{{{}create table
test_db.test_tbl_01 (id int, name string) stored as orc tblproperties
('transactional'='true'){}}}".
# Insert a row into the table just created in beeline via "{{{}insert into
table test_db.test_tbl_01 (1, "Alex");{}}}".
# Delete the row just inserted in beeline via "{{{}delete from
test_db.test_tbl_01 where id = 1{}}}".
# In Impala shell, execute "{{compute stats test_db.test_tbl_01}}".
# In Impala shell, execute "{{{}explain select * from test_db.test_tbl_01{}}}"
to verify that the warning message described above appears in the output.
The table '{{{}test_tbl_01{}}}' above has 0 row but the associated file size is
greater than 0.
It may be better that we revise the warning message to something less scary as
shown below.
{code:java}
The number of rows in the following tables or in a partition of them has 0 or
fewer than -1 row but positive total file size.
This does not necessarily imply the existence of corrupt statistics.
In the case of corrupt statistics, drop and re-compute statistics could resolve
this problem.
{code}
was:
Currently, '{{{}hasCorruptTableStats_{}}}' of an HDFS table is set to true when
one of the following is true in
[HdfsScanNode.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java].
# Its '{{{}cardinality_{}}}' is -1.
# The number of rows in one of its partition is less than -1.
# The number of rows in one of its partition is 0 but the size of the
associated files of this partition is greater than 0.
# The number of rows in the table is 0 but the size of the associated files of
this table is greater than 0.
For such a table, the {{EXPLAIN}} statement for queries involving the table
would contain the message of "{{{}WARNING: The following tables have
potentially corrupt table statistics. Drop and re-compute statistics to resolve
this problem.{}}}"
The warning message may be a bit too scary for an Impala user especially if we
consider the fact that a table without corrupt statistics could indeed have its
'{{{}hasCorruptTableStats_{}}}' set to true by Impala's frontend.
Specifically, a table without corrupt statistics but having its
'{{{}hasCorruptTableStats_{}}}' set to 1 could be created as follows after
starting the Impala cluster.
# Execute on the command line "{{{}beeline -u
"jdbc:hive2://localhost:11050/default"{}}}" to enter beeline.
# Create a transactional table in beeline via "{{{}create table
test_db.test_tbl_01 (id int, name string) stored as orc tblproperties
('transactional'='true'){}}}".
# Insert a row into the table just created in beeline via "{{{}insert into
table test_db.test_tbl_01 (1, "Alex");{}}}".
# Delete the row just inserted in beeline via "{{{}delete from
test_db.test_tbl_01 where id = 1{}}}".
# In Impala shell, execute "{{compute stats test_db.test_tbl_01}}".
# In Impala shell, execute "{{{}explain select * from test_db.test_tbl_01{}}}"
to verify that the warning message described above appears in the output.
The table '{{{}test_tbl_01{}}}' above has 0 row but the associated file size is
greater than 0.
It may be better that we revise the warning message to something less scary as
shown below.
{code:java}
The number of rows in the following tables or in a partition of them has 0 or
fewer than -1 row but positive total file size.
This does not necessarily imply the existence of corrupt statistics.
In the case of corrupt statistics, drop and re-compute statistics could resolve
this problem.
{code}
> Consider revising the warning message when hasCorruptTableStats_ is true for
> a table
> ------------------------------------------------------------------------------------
>
> Key: IMPALA-11666
> URL: https://issues.apache.org/jira/browse/IMPALA-11666
> Project: IMPALA
> Issue Type: Task
> Components: Frontend
> Reporter: Fang-Yu Rao
> Assignee: Fang-Yu Rao
> Priority: Major
>
> Currently, '{{{}hasCorruptTableStats_{}}}' of an HDFS table is set to true
> when one of the following is true in
> [HdfsScanNode.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java].
> # Its '{{{}cardinality_{}}}' less than -1.
> # The number of rows in one of its partition is less than -1.
> # The number of rows in one of its partition is 0 but the size of the
> associated files of this partition is greater than 0.
> # The number of rows in the table is 0 but the size of the associated files
> of this table is greater than 0.
> For such a table, the {{EXPLAIN}} statement for queries involving the table
> would contain the message of "{{{}WARNING: The following tables have
> potentially corrupt table statistics. Drop and re-compute statistics to
> resolve this problem.{}}}"
> The warning message may be a bit too scary for an Impala user especially if
> we consider the fact that a table without corrupt statistics could indeed
> have its '{{{}hasCorruptTableStats_{}}}' set to true by Impala's frontend.
> Specifically, a table without corrupt statistics but having its
> '{{{}hasCorruptTableStats_{}}}' set to 1 could be created as follows after
> starting the Impala cluster.
> # Execute on the command line "{{{}beeline -u
> "jdbc:hive2://localhost:11050/default"{}}}" to enter beeline.
> # Create a transactional table in beeline via "{{{}create table
> test_db.test_tbl_01 (id int, name string) stored as orc tblproperties
> ('transactional'='true'){}}}".
> # Insert a row into the table just created in beeline via "{{{}insert into
> table test_db.test_tbl_01 (1, "Alex");{}}}".
> # Delete the row just inserted in beeline via "{{{}delete from
> test_db.test_tbl_01 where id = 1{}}}".
> # In Impala shell, execute "{{compute stats test_db.test_tbl_01}}".
> # In Impala shell, execute "{{{}explain select * from
> test_db.test_tbl_01{}}}" to verify that the warning message described above
> appears in the output.
> The table '{{{}test_tbl_01{}}}' above has 0 row but the associated file size
> is greater than 0.
> It may be better that we revise the warning message to something less scary
> as shown below.
> {code:java}
> The number of rows in the following tables or in a partition of them has 0 or
> fewer than -1 row but positive total file size.
> This does not necessarily imply the existence of corrupt statistics.
> In the case of corrupt statistics, drop and re-compute statistics could
> resolve this problem.
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]