[ 
https://issues.apache.org/jira/browse/IMPALA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875266#comment-16875266
 ] 

Fang-Yu Rao edited comment on IMPALA-8730 at 6/28/19 10:37 PM:
---------------------------------------------------------------

Thanks for pointing this out Anurag.

After trying to execute the following SQL statement "SHOW TABLE STATS 
functional.alltypes;" from the Impala shell on my local dev box, I found that 
the table functional.alltypes DOES have statistical information, i.e., #Rows is 
not "-1" for each partition. That is, the cardinality returned by this SQL 
statement should be deterministic. However, the patch set for IMPALA-7608 
([https://gerrit.cloudera.org/c/12974/]) should not affect the code path when 
an hdfs table already has the statistical information and thus I suspect 
IMPALA-7608 might not cause this problem.

For easy reference, I also provide the returned results of the SQL statement in 
the following.
{code:java}
+-------+-------+-------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------+

| year  | month | #Rows | #Files | Size     | Bytes Cached | Cache Replication 
| Format | Incremental stats | Location                                         
                 |

+-------+-------+-------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------+

| 2009  | 1     | 310   | 1      | 19.95KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=1  |

| 2009  | 2     | 280   | 1      | 18.12KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=2  |

| 2009  | 3     | 310   | 1      | 20.06KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=3  |

| 2009  | 4     | 300   | 1      | 19.61KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=4  |

| 2009  | 5     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=5  |

| 2009  | 6     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=6  |

| 2009  | 7     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=7  |

| 2009  | 8     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=8  |

| 2009  | 9     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=9  |

| 2009  | 10    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=10 |

| 2009  | 11    | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11 |

| 2009  | 12    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=12 |

| 2010  | 1     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=1  |

| 2010  | 2     | 280   | 1      | 18.39KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=2  |

| 2010  | 3     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=3  |

| 2010  | 4     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=4  |

| 2010  | 5     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=5  |

| 2010  | 6     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=6  |

| 2010  | 7     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=7  |

| 2010  | 8     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=8  |

| 2010  | 9     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=9  |

| 2010  | 10    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=10 |

| 2010  | 11    | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=11 |

| 2010  | 12    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=12 |

| Total |       | 7300  | 24     | 478.45KB | 0B           |                   
|        |                   |                                                  
                 |

+-------+-------+-------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------+{code}
In addition, I also tried to execute the following SQL statements on my local 
dev box to get the corresponding query plan. The SQL statements are the same as 
the SQL statements issued in that EE test 
([https://github.com/apache/impala/blob/master/tests/metadata/test_explain.py#L110-L113]).
{code:java}
SET EXPLAIN_LEVEL=3;
EXPLAIN SELECT * FROM functional.alltypes;{code}
The resulting query plan is provided as follows. It can be seen that the 
cardinality is 7.30K instead of 7.00K. That being said, if we've got '7.00K' in 
the test, then there seems to be some missing rows/partitions in that table 
(i.e., functional.alltypes). 
{code:java}
+----------------------------------------------------------------------------------------+

| Explain String                                                                
         |

+----------------------------------------------------------------------------------------+

| Max Per-Host Resource Reservation: Memory=32.00KB Threads=3                   
         |

| Per-Host Resource Estimates: Memory=128MB                                     
         |

| Codegen disabled by planner                                                   
         |

| Analyzed query: SELECT * FROM functional.alltypes                             
         |

|                                                                               
         |

| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                         
         |

| Per-Host Resources: mem-estimate=490.49KB mem-reservation=0B 
thread-reservation=1      |

|   PLAN-ROOT SINK                                                              
         |

|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
         |

|   |                                                                           
         |

|   01:EXCHANGE [UNPARTITIONED]                                                 
         |

|      mem-estimate=490.49KB mem-reservation=0B thread-reservation=0            
         |

|      tuple-ids=0 row-size=89B cardinality=7.30K                               
         |

|      in pipelines: 00(GETNEXT)                                                
         |

|                                                                               
         |

| F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                
         |

| Per-Host Resources: mem-estimate=128.00MB mem-reservation=32.00KB 
thread-reservation=2 |

|   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, UNPARTITIONED]                  
         |

|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
         |

|   00:SCAN HDFS [functional.alltypes, RANDOM]                                  
         |

|      HDFS partitions=24/24 files=24 size=478.45KB                             
         |

|      stored statistics:                                                       
         |

|        table: rows=7.30K size=478.45KB                                        
         |

|        partitions: 24/24 rows=7.30K                                           
         |

|        columns: all                                                           
         |

|      extrapolated-rows=disabled max-scan-range-rows=310                       
         |

|      mem-estimate=128.00MB mem-reservation=32.00KB thread-reservation=1       
         |

|      tuple-ids=0 row-size=89B cardinality=7.30K                               
         |

|      in pipelines: 00(GETNEXT)                                                
         |

+----------------------------------------------------------------------------------------+


{code}


was (Author: fangyurao):
Thanks for pointing this out Anurag.

After trying to execute the following SQL statement "SHOW TABLE STATS 
functional.alltypes;" from the Impala shell on my local dev box, I found that 
the table functional.alltypes DOES have statistical information, i.e., #Rows is 
not "-1" for each partition. That is, the cardinality returned by this SQL 
statement should be deterministic. However, the patch set for IMPALA-7608 
([https://gerrit.cloudera.org/c/12974/]) should not affect the code path when 
an hdfs table already has the statistical information and thus I suspect 
IMPALA-7608 might not cause this problem.

For easy reference, I also provide the returned results of the SQL statement in 
the following.
{code:java}
+-------+-------+-------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------+

| year  | month | #Rows | #Files | Size     | Bytes Cached | Cache Replication 
| Format | Incremental stats | Location                                         
                 |

+-------+-------+-------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------+

| 2009  | 1     | 310   | 1      | 19.95KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=1  |

| 2009  | 2     | 280   | 1      | 18.12KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=2  |

| 2009  | 3     | 310   | 1      | 20.06KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=3  |

| 2009  | 4     | 300   | 1      | 19.61KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=4  |

| 2009  | 5     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=5  |

| 2009  | 6     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=6  |

| 2009  | 7     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=7  |

| 2009  | 8     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=8  |

| 2009  | 9     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=9  |

| 2009  | 10    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=10 |

| 2009  | 11    | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11 |

| 2009  | 12    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=12 |

| 2010  | 1     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=1  |

| 2010  | 2     | 280   | 1      | 18.39KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=2  |

| 2010  | 3     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=3  |

| 2010  | 4     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=4  |

| 2010  | 5     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=5  |

| 2010  | 6     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=6  |

| 2010  | 7     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=7  |

| 2010  | 8     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=8  |

| 2010  | 9     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=9  |

| 2010  | 10    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=10 |

| 2010  | 11    | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=11 |

| 2010  | 12    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=12 |

| Total |       | 7300  | 24     | 478.45KB | 0B           |                   
|        |                   |                                                  
                 |

+-------+-------+-------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------+{code}
In addition, I also tried to execute the following SQL statement on my local 
dev box to get the corresponding query plan, which is the same as the SQL 
statement issued in that EE test 
([https://github.com/apache/impala/blob/master/tests/metadata/test_explain.py#L110-L113]).
{code:java}
SET EXPLAIN_LEVEL=3;
EXPLAIN SELECT * FROM functional.alltypes;{code}
The resulting query plan is provided as follows. It can be seen that the 
cardinality is 7.30K instead of 7.00K. That being said, if we've got '7.00K' in 
the test, then there seems to be some missing rows/partitions in that table 
(i.e., functional.alltypes). 
{code:java}
+----------------------------------------------------------------------------------------+

| Explain String                                                                
         |

+----------------------------------------------------------------------------------------+

| Max Per-Host Resource Reservation: Memory=32.00KB Threads=3                   
         |

| Per-Host Resource Estimates: Memory=128MB                                     
         |

| Codegen disabled by planner                                                   
         |

| Analyzed query: SELECT * FROM functional.alltypes                             
         |

|                                                                               
         |

| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                         
         |

| Per-Host Resources: mem-estimate=490.49KB mem-reservation=0B 
thread-reservation=1      |

|   PLAN-ROOT SINK                                                              
         |

|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
         |

|   |                                                                           
         |

|   01:EXCHANGE [UNPARTITIONED]                                                 
         |

|      mem-estimate=490.49KB mem-reservation=0B thread-reservation=0            
         |

|      tuple-ids=0 row-size=89B cardinality=7.30K                               
         |

|      in pipelines: 00(GETNEXT)                                                
         |

|                                                                               
         |

| F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                
         |

| Per-Host Resources: mem-estimate=128.00MB mem-reservation=32.00KB 
thread-reservation=2 |

|   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, UNPARTITIONED]                  
         |

|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
         |

|   00:SCAN HDFS [functional.alltypes, RANDOM]                                  
         |

|      HDFS partitions=24/24 files=24 size=478.45KB                             
         |

|      stored statistics:                                                       
         |

|        table: rows=7.30K size=478.45KB                                        
         |

|        partitions: 24/24 rows=7.30K                                           
         |

|        columns: all                                                           
         |

|      extrapolated-rows=disabled max-scan-range-rows=310                       
         |

|      mem-estimate=128.00MB mem-reservation=32.00KB thread-reservation=1       
         |

|      tuple-ids=0 row-size=89B cardinality=7.30K                               
         |

|      in pipelines: 00(GETNEXT)                                                
         |

+----------------------------------------------------------------------------------------+


{code}

> TestExplain.test_explain_validate_cardinality_estimates flakiness due to 
> reliance on exact cardinality numbers.
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-8730
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8730
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 3.3.0
>            Reporter: Anurag Mantripragada
>            Assignee: Fang-Yu Rao
>            Priority: Critical
>              Labels: broken-build
>
> Another flaky test uncovered due to IMPALA-7608. The test relies on exact 
> cardinality numbers but after IMPALA-7608, these could be indeterministic. 
>  
> Below is the error message:
> {code:java}
> metadata/test_explain.py:113: in test_explain_validate_cardinality_estimates  
>    check_cardinality(result.data, '7.30K') metadata/test_explain.py:98: in 
> check_cardinality     query_result, 
> expected_cardinality=expected_cardinality) metadata/test_explain.py:86: in 
> check_row_size_and_cardinality     assert m.groups()[1] == 
> expected_cardinality E   assert '7.00K' == '7.30K' E     - 7.00K E     ?   ^ 
> E     + 7.30K E     ?   ^{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to