[
https://issues.apache.org/jira/browse/HIVE-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15752287#comment-15752287
]
Hive QA commented on HIVE-15122:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843446/HIVE-15122.patch
{color:red}ERROR:{color} -1 due to no test(s) being added or modified.
{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10818 tests
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array]
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2]
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
(batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
(batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_casts]
(batchId=152)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery
(batchId=216)
{noformat}
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2593/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2593/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2593/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12843446 - PreCommit-HIVE-Build
> Hive: Upcasting types should not obscure stats (min/max/ndv)
> ------------------------------------------------------------
>
> Key: HIVE-15122
> URL: https://issues.apache.org/jira/browse/HIVE-15122
> Project: Hive
> Issue Type: Bug
> Reporter: Siddharth Seth
> Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15122.patch
>
>
> A UDFToLong breaks PK/FK inferences and triggers mis-estimation of joins in
> LLAP.
> Snippet from the bad plan.
> {code}
> | STAGE PLANS:
>
> |
> | Stage: Stage-1
>
> |
> | Tez
>
> |
> | DagId: hive_20161031222730_a700058f-78eb-40d6-a67d-43add60a50e2:6
>
> |
> | Edges:
>
> |
> | Map 2 <- Map 1 (BROADCAST_EDGE)
>
> |
> | Map 3 <- Map 2 (BROADCAST_EDGE)
>
> |
> | Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE), Map 7
> (CUSTOM_SIMPLE_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)
> |
> | Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
>
> |
> | Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>
> |
> | DagName:
>
> |
> | Vertices:
>
> |
> | Map 1
>
> |
> | Map Operator Tree:
>
> |
> | TableScan
>
> |
> | alias: supplier
>
> |
> | filterExpr: (s_suppkey is not null and s_nationkey is not
> null) (type: boolean)
> |
> | Statistics: Num rows: 10000000 Data size: 160000000 Basic
> stats: COMPLETE Column stats: COMPLETE
> |
> | Filter Operator
>
> |
> | predicate: (s_suppkey is not null and s_nationkey is
> not null) (type: boolean)
> |
> | Statistics: Num rows: 10000000 Data size: 160000000
> Basic stats: COMPLETE Column stats: COMPLETE
> |
> | Select Operator
>
> |
> | expressions: s_suppkey (type: bigint), s_nationkey
> (type: bigint)
> |
> | outputColumnNames: _col0, _col1
>
> |
> | Statistics: Num rows: 10000000 Data size: 160000000
> Basic stats: COMPLETE Column stats: COMPLETE
> |
> | Reduce Output Operator
>
> |
> | key expressions: _col0 (type: bigint)
>
> |
> | sort order: +
>
> |
> | Map-reduce partition columns: _col0 (type: bigint)
>
> |
> | Statistics: Num rows: 10000000 Data size: 160000000
> Basic stats: COMPLETE Column stats: COMPLETE
> |
> | value expressions: _col1 (type: bigint)
>
> |
> | Execution mode: vectorized, llap
>
> |
> | LLAP IO: all inputs
>
> |
> | Map 2
>
> |
> | Map Operator Tree:
>
> |
> | TableScan
>
> |
> | alias: lineitem
>
> |
> | filterExpr: (l_suppkey is not null and l_orderkey is not
> null) (type: boolean)
> |
> | Statistics: Num rows: 2285121364 Data size: 63983407882
> Basic stats: COMPLETE Column stats: PARTIAL
> |
> | Filter Operator
>
> |
> | predicate: (l_suppkey is not null and l_orderkey is not
> null) (type: boolean)
> |
> | Statistics: Num rows: 2285121364 Data size:
> 127966796384 Basic stats: COMPLETE Column stats: PARTIAL
> |
> | Select Operator
>
> |
> | expressions: l_orderkey (type: bigint), l_suppkey
> (type: int), l_extendedprice (type: double), l_discount (type: double),
> l_shipdate (type: date) |
> | outputColumnNames: _col0, _col1, _col2, _col3, _col4
>
> |
> | Statistics: Num rows: 2285121364 Data size:
> 127966796384 Basic stats: COMPLETE Column stats: PARTIAL
> |
> | Map Join Operator
>
> |
> | condition map:
>
> |
> | Inner Join 0 to 1
>
> |
> | keys:
>
> |
> | 0 _col0 (type: bigint)
>
> |
> | 1 UDFToLong(_col1) (type: bigint)
>
> |
> | outputColumnNames: _col1, _col2, _col4, _col5,
> _col6
> |
> | input vertices:
>
> |
> | 0 Map 1
>
> |
> | Statistics: Num rows: 10000000 Data size: 880000000
> Basic stats: COMPLETE Column stats: PARTIAL
> |
> | Reduce Output Operator
>
> |
> | key expressions: _col2 (type: bigint)
>
> |
> | sort order: +
>
> |
> | Map-reduce partition columns: _col2 (type:
> bigint)
> |
> | Statistics: Num rows: 10000000 Data size:
> 880000000 Basic stats: COMPLETE Column stats: PARTIAL
> |
> | value expressions: _col1 (type: bigint), _col4
> (type: double), _col5 (type: double), _col6 (type: date)
> |
> | Execution mode: vectorized, llap
>
> |
> | LLAP IO: all inputs
>
> |
> | Map 3
>
> |
> | Map Operator Tree:
>
> |
> | TableScan
>
> |
> | alias: orders
>
> |
> | filterExpr: (o_orderkey is not null and o_custkey is not
> null) (type: boolean)
> |
> | Statistics: Num rows: 4318801126 Data size: 51825626753
> Basic stats: COMPLETE Column stats: NONE
> |
> | Filter Operator
>
> |
> | predicate: (o_orderkey is not null and o_custkey is not
> null) (type: boolean)
> |
> | Statistics: Num rows: 4318801126 Data size: 51825626753
> Basic stats: COMPLETE Column stats: NONE
> |
> | Select Operator
>
> |
> | expressions: o_orderkey (type: int), o_custkey (type:
> bigint)
> |
> | outputColumnNames: _col0, _col1
>
> |
> | Statistics: Num rows: 4318801126 Data size:
> 51825626753 Basic stats: COMPLETE Column stats: NONE
> |
> | Map Join Operator
>
> |
> | condition map:
>
> |
> | Inner Join 0 to 1
>
> |
> | keys:
>
> |
> | 0 _col2 (type: bigint)
>
> |
> | 1 UDFToLong(_col0) (type: bigint)
>
> |
> | outputColumnNames: _col1, _col4, _col5, _col6,
> _col8
> |
> | input vertices:
>
> |
> | 0 Map 2
>
> |
> | Statistics: Num rows: 4750681341 Data size:
> 57008190663 Basic stats: COMPLETE Column stats: NONE
> |
> | Reduce Output Operator
>
> |
> | key expressions: _col8 (type: bigint)
>
> |
> | sort order: +
>
> |
> | Map-reduce partition columns: _col8 (type:
> bigint)
> |
> | Statistics: Num rows: 4750681341 Data size:
> 57008190663 Basic stats: COMPLETE Column stats: NONE
> |
> | value expressions: _col1 (type: bigint), _col4
> (type: double), _col5 (type: double), _col6 (type: date)
> |
> | Execution mode: vectorized, llap
>
> |
> | LLAP IO: all inputs
>
> |
> | Map 7
>
> {code}
> Note the Map2 to Map3 output.
> This causes a rather large join (120GB) to be categorized as a map-join.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)