[jira] [Commented] (HIVE-17114) HoS: Possible skew in shuffling when data is not really skewed

Hive QA (JIRA) Tue, 18 Jul 2017 22:04:13 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092577#comment-16092577
 ]


Hive QA commented on HIVE-17114:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12877923/HIVE-17114.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 43 failed/errored test(s), 11073 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[constprog_semijoin]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_2]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_op_stats]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_ts_stats_for_mapjoin]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby3] 
(batchId=102)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join_filter] 
(batchId=124)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_remove_10] 
(batchId=110)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_remove_13] 
(batchId=137)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_remove_15] 
(batchId=136)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_remove_16] 
(batchId=131)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_remove_7] 
(batchId=101)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_remove_8] 
(batchId=121)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_remove_9] 
(batchId=131)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] 
(batchId=124)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_cast_constant]
 (batchId=104)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_count_distinct]
 (batchId=111)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_decimal_aggregate]
 (batchId=108)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_distinct_2] 
(batchId=122)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_groupby_3] 
(batchId=127)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=134)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_orderby_5] 
(batchId=118)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_string_concat]
 (batchId=114)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_12] 
(batchId=104)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_13] 
(batchId=121)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_14] 
(batchId=106)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_15] 
(batchId=127)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_16] 
(batchId=118)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_9] 
(batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_short_regress]
 (batchId=121)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=127)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin]
 (batchId=131)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6082/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6082/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6082/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 43 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12877923 - PreCommit-HIVE-Build

> HoS: Possible skew in shuffling when data is not really skewed
> --------------------------------------------------------------
>
>                 Key: HIVE-17114
>                 URL: https://issues.apache.org/jira/browse/HIVE-17114
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
>            Assignee: Rui Li
>            Priority: Minor
>         Attachments: HIVE-17114.1.patch
>
>
> Observed in HoS and may apply to other engines as well.
> When we join 2 tables on a single int key, we use the key itself as hash code 
> in {{ObjectInspectorUtils.hashCode}}:
> {code}
>       case INT:
>         return ((IntObjectInspector) poi).get(o);
> {code}
> Suppose the keys are different but are all some multiples of 10. And if we 
> choose 10 as #reducers, the shuffle will be skewed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17114) HoS: Possible skew in shuffling when data is not really skewed

Reply via email to