[ 
https://issues.apache.org/jira/browse/IMPALA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875266#comment-16875266
 ] 

Fang-Yu Rao commented on IMPALA-8730:
-------------------------------------

Thanks for pointing this out Anurag.

After trying to execute the following SQL statement "SHOW TABLE STATS 
functional.alltypes;" from the Impala shell, I found that the table 
functional.alltypes DOES have statistical information, i.e., #Rows is not "-1" 
for each partition. That is, the cardinality returned by this SQL statement 
should be deterministic. However, the patch set for IMPALA-7608 
([https://gerrit.cloudera.org/c/12974/]) should not affect the code path when 
an hdfs table already has the statistical information and thus I suspect 
IMPALA-7608 might not cause this problem.

For easy reference, I also provide the returned results of the SQL statement in 
the following.

 

 
{code:java}
+-------+-------+-------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------+

| year  | month | #Rows | #Files | Size     | Bytes Cached | Cache Replication 
| Format | Incremental stats | Location                                         
                 |

+-------+-------+-------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------+

| 2009  | 1     | 310   | 1      | 19.95KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=1  |

| 2009  | 2     | 280   | 1      | 18.12KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=2  |

| 2009  | 3     | 310   | 1      | 20.06KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=3  |

| 2009  | 4     | 300   | 1      | 19.61KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=4  |

| 2009  | 5     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=5  |

| 2009  | 6     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=6  |

| 2009  | 7     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=7  |

| 2009  | 8     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=8  |

| 2009  | 9     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=9  |

| 2009  | 10    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=10 |

| 2009  | 11    | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11 |

| 2009  | 12    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=12 |

| 2010  | 1     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=1  |

| 2010  | 2     | 280   | 1      | 18.39KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=2  |

| 2010  | 3     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=3  |

| 2010  | 4     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=4  |

| 2010  | 5     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=5  |

| 2010  | 6     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=6  |

| 2010  | 7     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=7  |

| 2010  | 8     | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=8  |

| 2010  | 9     | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=9  |

| 2010  | 10    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=10 |

| 2010  | 11    | 300   | 1      | 19.71KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=11 |

| 2010  | 12    | 310   | 1      | 20.36KB  | NOT CACHED   | NOT CACHED        
| TEXT   | false             | 
hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=12 |

| Total |       | 7300  | 24     | 478.45KB | 0B           |                   
|        |                   |                                                  
                 |

+-------+-------+-------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------+{code}
In addition, I also tried to execute the following SQL statement to get the 
corresponding query plan, which is the same as the SQL statement issued in that 
EE test 
([https://github.com/apache/impala/blob/master/tests/metadata/test_explain.py#L110-L113]).

 
{code:java}
SET EXPLAIN_LEVEL=3;
EXPLAIN SELECT * FROM functional.alltypes;{code}
The resulting query plan is provided as follows. It can be seen that the 
cardinality is 7.30K instead of 7.00K. That being said, if we've got '7.00K' in 
the test, then there seems to be some missing rows/partitions in that table 
(i.e., functional.alltypes).

 

 
{code:java}
+----------------------------------------------------------------------------------------+

| Explain String                                                                
         |

+----------------------------------------------------------------------------------------+

| Max Per-Host Resource Reservation: Memory=32.00KB Threads=3                   
         |

| Per-Host Resource Estimates: Memory=128MB                                     
         |

| Codegen disabled by planner                                                   
         |

| Analyzed query: SELECT * FROM functional.alltypes                             
         |

|                                                                               
         |

| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                         
         |

| Per-Host Resources: mem-estimate=490.49KB mem-reservation=0B 
thread-reservation=1      |

|   PLAN-ROOT SINK                                                              
         |

|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
         |

|   |                                                                           
         |

|   01:EXCHANGE [UNPARTITIONED]                                                 
         |

|      mem-estimate=490.49KB mem-reservation=0B thread-reservation=0            
         |

|      tuple-ids=0 row-size=89B cardinality=7.30K                               
         |

|      in pipelines: 00(GETNEXT)                                                
         |

|                                                                               
         |

| F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                
         |

| Per-Host Resources: mem-estimate=128.00MB mem-reservation=32.00KB 
thread-reservation=2 |

|   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, UNPARTITIONED]                  
         |

|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
         |

|   00:SCAN HDFS [functional.alltypes, RANDOM]                                  
         |

|      HDFS partitions=24/24 files=24 size=478.45KB                             
         |

|      stored statistics:                                                       
         |

|        table: rows=7.30K size=478.45KB                                        
         |

|        partitions: 24/24 rows=7.30K                                           
         |

|        columns: all                                                           
         |

|      extrapolated-rows=disabled max-scan-range-rows=310                       
         |

|      mem-estimate=128.00MB mem-reservation=32.00KB thread-reservation=1       
         |

|      tuple-ids=0 row-size=89B cardinality=7.30K                               
         |

|      in pipelines: 00(GETNEXT)                                                
         |

+----------------------------------------------------------------------------------------+
{code}
 

 

 

> TestExplain.test_explain_validate_cardinality_estimates flakiness due to 
> reliance on exact cardinality numbers.
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-8730
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8730
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 3.3.0
>            Reporter: Anurag Mantripragada
>            Assignee: Fang-Yu Rao
>            Priority: Critical
>              Labels: broken-build
>
> Another flaky test uncovered due to IMPALA-7608. The test relies on exact 
> cardinality numbers but after IMPALA-7608, these could be indeterministic. 
>  
> Below is the error message:
> {code:java}
> metadata/test_explain.py:113: in test_explain_validate_cardinality_estimates  
>    check_cardinality(result.data, '7.30K') metadata/test_explain.py:98: in 
> check_cardinality     query_result, 
> expected_cardinality=expected_cardinality) metadata/test_explain.py:86: in 
> check_row_size_and_cardinality     assert m.groups()[1] == 
> expected_cardinality E   assert '7.00K' == '7.30K' E     - 7.00K E     ?   ^ 
> E     + 7.30K E     ?   ^{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to