Fang-Yu Rao created IMPALA-11666:
------------------------------------

             Summary: Consider revising the warning message when 
hasCorruptTableStats_ is true for a table
                 Key: IMPALA-11666
                 URL: https://issues.apache.org/jira/browse/IMPALA-11666
             Project: IMPALA
          Issue Type: Task
          Components: Frontend
            Reporter: Fang-Yu Rao
            Assignee: Fang-Yu Rao


Currently, '{{{}hasCorruptTableStats_{}}}' of an HDFS table is set to true when 
one of the following is true in 
[HdfsScanNode.java|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java].
 # Its '{{{}cardinality_{}}}' is -1.
 # The number of rows in one of its partition is less than -1.
 # The number of rows in one of its partition is 0 but the size of the 
associated files of this partition is greater than 0.
 # The number of rows in the table is 0 but the size of the associated files of 
this table is greater than 0.

For such a table, the {{EXPLAIN}} statement for queries involving the table 
would contain the message of "{{{}WARNING: The following tables have 
potentially corrupt table statistics. Drop and re-compute statistics to resolve 
this problem.{}}}"

The warning message may be a bit too scary for an Impala user especially if we 
consider the fact that a table without corrupt statistics could indeed have its 
'{{{}hasCorruptTableStats_{}}}' set to true by Impala's frontend.

Specifically, a table without corrupt statistics but having its 
'{{{}hasCorruptTableStats_{}}}' set to 1 could be created as follows after 
starting the Impala cluster.
 # Execute on the command line "{{{}beeline -u 
"jdbc:hive2://localhost:11050/default"{}}}" to enter beeline.
 # Create a transactional table in beeline via "{{{}create table 
test_db.test_tbl_01 (id int, name string) stored as orc tblproperties 
('transactional'='true'){}}}".
 # Insert a row into the table just created in beeline via "{{{}insert into 
table test_db.test_tbl_01 (1, "Alex");{}}}".
 # Delete the row just inserted in beeline via "{{{}delete from 
test_db.test_tbl_01 where id = 1{}}}".
 # In Impala shell, execute "{{{}explain select * from test_db.test_tbl_01{}}}" 
to verify that the warning message described above appears in the output.

The table '{{{}test_tbl_01{}}}' above has 0 row but the associated file size is 
greater than 0.

It may be better that we revise the warning message to something less scary as 
shown below.
{code:java}
The number of rows in the following tables or in a partition of them has 0 or 
fewer than -1 row but positive total file size.
This does not necessarily imply the existence of corrupt statistics.
In the case of corrupt statistics, drop and re-compute statistics could resolve 
this problem.
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to