ndv)

Hive QA (JIRA) Thu, 15 Dec 2016 11:36:52 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15752287#comment-15752287
 ]


Hive QA commented on HIVE-15122:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843446/HIVE-15122.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10818 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_casts]
 (batchId=152)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery 
(batchId=216)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2593/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2593/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2593/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843446 - PreCommit-HIVE-Build

> Hive: Upcasting types should not obscure stats (min/max/ndv)
> ------------------------------------------------------------
>
>                 Key: HIVE-15122
>                 URL: https://issues.apache.org/jira/browse/HIVE-15122
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-15122.patch
>
>
> A UDFToLong breaks PK/FK inferences and triggers mis-estimation of joins in 
> LLAP.
> Snippet from the bad plan.
> {code}
> | STAGE PLANS:                                                                
>                                                                               
>                |
> |   Stage: Stage-1                                                            
>                                                                               
>                |
> |     Tez                                                                     
>                                                                               
>                |
> |       DagId: hive_20161031222730_a700058f-78eb-40d6-a67d-43add60a50e2:6     
>                                                                               
>                |
> |       Edges:                                                                
>                                                                               
>                |
> |         Map 2 <- Map 1 (BROADCAST_EDGE)                                     
>                                                                               
>                |
> |         Map 3 <- Map 2 (BROADCAST_EDGE)                                     
>                                                                               
>                |
> |         Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE), Map 7 
> (CUSTOM_SIMPLE_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)          
>                                     |
> |         Reducer 5 <- Reducer 4 (SIMPLE_EDGE)                                
>                                                                               
>                |
> |         Reducer 6 <- Reducer 5 (SIMPLE_EDGE)                                
>                                                                               
>                |
> |       DagName:                                                              
>                                                                               
>                |
> |       Vertices:                                                             
>                                                                               
>                |
> |         Map 1                                                               
>                                                                               
>                |
> |             Map Operator Tree:                                              
>                                                                               
>                |
> |                 TableScan                                                   
>                                                                               
>                |
> |                   alias: supplier                                           
>                                                                               
>                |
> |                   filterExpr: (s_suppkey is not null and s_nationkey is not 
> null) (type: boolean)                                                         
>                |
> |                   Statistics: Num rows: 10000000 Data size: 160000000 Basic 
> stats: COMPLETE Column stats: COMPLETE                                        
>                |
> |                   Filter Operator                                           
>                                                                               
>                |
> |                     predicate: (s_suppkey is not null and s_nationkey is 
> not null) (type: boolean)                                                     
>                   |
> |                     Statistics: Num rows: 10000000 Data size: 160000000 
> Basic stats: COMPLETE Column stats: COMPLETE                                  
>                    |
> |                     Select Operator                                         
>                                                                               
>                |
> |                       expressions: s_suppkey (type: bigint), s_nationkey 
> (type: bigint)                                                                
>                   |
> |                       outputColumnNames: _col0, _col1                       
>                                                                               
>                |
> |                       Statistics: Num rows: 10000000 Data size: 160000000 
> Basic stats: COMPLETE Column stats: COMPLETE                                  
>                  |
> |                       Reduce Output Operator                                
>                                                                               
>                |
> |                         key expressions: _col0 (type: bigint)               
>                                                                               
>                |
> |                         sort order: +                                       
>                                                                               
>                |
> |                         Map-reduce partition columns: _col0 (type: bigint)  
>                                                                               
>                |
> |                         Statistics: Num rows: 10000000 Data size: 160000000 
> Basic stats: COMPLETE Column stats: COMPLETE                                  
>                |
> |                         value expressions: _col1 (type: bigint)             
>                                                                               
>                |
> |             Execution mode: vectorized, llap                                
>                                                                               
>                |
> |             LLAP IO: all inputs                                             
>                                                                               
>                |
> |         Map 2                                                               
>                                                                               
>                |
> |             Map Operator Tree:                                              
>                                                                               
>                |
> |                 TableScan                                                   
>                                                                               
>                |
> |                   alias: lineitem                                           
>                                                                               
>                |
> |                   filterExpr: (l_suppkey is not null and l_orderkey is not 
> null) (type: boolean)                                                         
>                 |
> |                   Statistics: Num rows: 2285121364 Data size: 63983407882 
> Basic stats: COMPLETE Column stats: PARTIAL                                   
>                  |
> |                   Filter Operator                                           
>                                                                               
>                |
> |                     predicate: (l_suppkey is not null and l_orderkey is not 
> null) (type: boolean)                                                         
>                |
> |                     Statistics: Num rows: 2285121364 Data size: 
> 127966796384 Basic stats: COMPLETE Column stats: PARTIAL                      
>                            |
> |                     Select Operator                                         
>                                                                               
>                |
> |                       expressions: l_orderkey (type: bigint), l_suppkey 
> (type: int), l_extendedprice (type: double), l_discount (type: double), 
> l_shipdate (type: date)  |
> |                       outputColumnNames: _col0, _col1, _col2, _col3, _col4  
>                                                                               
>                |
> |                       Statistics: Num rows: 2285121364 Data size: 
> 127966796384 Basic stats: COMPLETE Column stats: PARTIAL                      
>                          |
> |                       Map Join Operator                                     
>                                                                               
>                |
> |                         condition map:                                      
>                                                                               
>                |
> |                              Inner Join 0 to 1                              
>                                                                               
>                |
> |                         keys:                                               
>                                                                               
>                |
> |                           0 _col0 (type: bigint)                            
>                                                                               
>                |
> |                           1 UDFToLong(_col1) (type: bigint)                 
>                                                                               
>                |
> |                         outputColumnNames: _col1, _col2, _col4, _col5, 
> _col6                                                                         
>                     |
> |                         input vertices:                                     
>                                                                               
>                |
> |                           0 Map 1                                           
>                                                                               
>                |
> |                         Statistics: Num rows: 10000000 Data size: 880000000 
> Basic stats: COMPLETE Column stats: PARTIAL                                   
>                |
> |                         Reduce Output Operator                              
>                                                                               
>                |
> |                           key expressions: _col2 (type: bigint)             
>                                                                               
>                |
> |                           sort order: +                                     
>                                                                               
>                |
> |                           Map-reduce partition columns: _col2 (type: 
> bigint)                                                                       
>                       |
> |                           Statistics: Num rows: 10000000 Data size: 
> 880000000 Basic stats: COMPLETE Column stats: PARTIAL                         
>                        |
> |                           value expressions: _col1 (type: bigint), _col4 
> (type: double), _col5 (type: double), _col6 (type: date)                      
>                   |
> |             Execution mode: vectorized, llap                                
>                                                                               
>                |
> |             LLAP IO: all inputs                                             
>                                                                               
>                |
> |         Map 3                                                               
>                                                                               
>                |
> |             Map Operator Tree:                                              
>                                                                               
>                |
> |                 TableScan                                                   
>                                                                               
>                |
> |                   alias: orders                                             
>                                                                               
>                |
> |                   filterExpr: (o_orderkey is not null and o_custkey is not 
> null) (type: boolean)                                                         
>                 |
> |                   Statistics: Num rows: 4318801126 Data size: 51825626753 
> Basic stats: COMPLETE Column stats: NONE                                      
>                  |
> |                   Filter Operator                                           
>                                                                               
>                |
> |                     predicate: (o_orderkey is not null and o_custkey is not 
> null) (type: boolean)                                                         
>                |
> |                     Statistics: Num rows: 4318801126 Data size: 51825626753 
> Basic stats: COMPLETE Column stats: NONE                                      
>                |
> |                     Select Operator                                         
>                                                                               
>                |
> |                       expressions: o_orderkey (type: int), o_custkey (type: 
> bigint)                                                                       
>                |
> |                       outputColumnNames: _col0, _col1                       
>                                                                               
>                |
> |                       Statistics: Num rows: 4318801126 Data size: 
> 51825626753 Basic stats: COMPLETE Column stats: NONE                          
>                          |
> |                       Map Join Operator                                     
>                                                                               
>                |
> |                         condition map:                                      
>                                                                               
>                |
> |                              Inner Join 0 to 1                              
>                                                                               
>                |
> |                         keys:                                               
>                                                                               
>                |
> |                           0 _col2 (type: bigint)                            
>                                                                               
>                |
> |                           1 UDFToLong(_col0) (type: bigint)                 
>                                                                               
>                |
> |                         outputColumnNames: _col1, _col4, _col5, _col6, 
> _col8                                                                         
>                     |
> |                         input vertices:                                     
>                                                                               
>                |
> |                           0 Map 2                                           
>                                                                               
>                |
> |                         Statistics: Num rows: 4750681341 Data size: 
> 57008190663 Basic stats: COMPLETE Column stats: NONE                          
>                        |
> |                         Reduce Output Operator                              
>                                                                               
>                |
> |                           key expressions: _col8 (type: bigint)             
>                                                                               
>                |
> |                           sort order: +                                     
>                                                                               
>                |
> |                           Map-reduce partition columns: _col8 (type: 
> bigint)                                                                       
>                       |
> |                           Statistics: Num rows: 4750681341 Data size: 
> 57008190663 Basic stats: COMPLETE Column stats: NONE                          
>                      |
> |                           value expressions: _col1 (type: bigint), _col4 
> (type: double), _col5 (type: double), _col6 (type: date)                      
>                   |
> |             Execution mode: vectorized, llap                                
>                                                                               
>                |
> |             LLAP IO: all inputs                                             
>                                                                               
>                |
> |         Map 7                                                               
>                                                                   
> {code}
> Note the Map2 to Map3 output.
> This causes a rather large join (120GB) to be categorized as a map-join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15122) Hive: Upcasting types should not obscure stats (min/max/ndv)

Reply via email to