[ https://issues.apache.org/jira/browse/HIVE-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016972#comment-14016972 ]
Hive QA commented on HIVE-4867: ------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648073/HIVE-4867.3.patch.txt {color:red}ERROR:{color} -1 due to 58 failed/errored test(s), 5510 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_like_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_display_colstats_tbllvl org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join17 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters_overlap org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_fs_overwrite org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_louter_join_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_outer_join_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_vc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_push_or org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_router_join_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_serde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_union_remove_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_union_remove_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_25 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input4 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5 org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/379/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/379/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-379/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 58 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12648073 > Deduplicate columns appearing in both the key list and value list of > ReduceSinkOperator > --------------------------------------------------------------------------------------- > > Key: HIVE-4867 > URL: https://issues.apache.org/jira/browse/HIVE-4867 > Project: Hive > Issue Type: Improvement > Reporter: Yin Huai > Assignee: Navis > Attachments: HIVE-4867.1.patch.txt, HIVE-4867.2.patch.txt, > HIVE-4867.3.patch.txt, source_only.txt > > > A ReduceSinkOperator emits data in the format of keys and values. Right now, > a column may appear in both the key list and value list, which result in > unnecessary overhead for shuffling. > Example: > We have a query shown below ... > {code:sql} > explain select ss_ticket_number from store_sales cluster by ss_ticket_number; > {\code} > The plan is ... > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > store_sales > TableScan > alias: store_sales > Select Operator > expressions: > expr: ss_ticket_number > type: int > outputColumnNames: _col0 > Reduce Output Operator > key expressions: > expr: _col0 > type: int > sort order: + > Map-reduce partition columns: > expr: _col0 > type: int > tag: -1 > value expressions: > expr: _col0 > type: int > Reduce Operator Tree: > Extract > File Output Operator > compressed: false > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Stage: Stage-0 > Fetch Operator > limit: -1 > {\code} > The column 'ss_ticket_number' is in both the key list and value list of the > ReduceSinkOperator. The type of ss_ticket_number is int. For this case, > BinarySortableSerDe will introduce 1 byte more for every int in the key. > LazyBinarySerDe will also introduce overhead when recording the length of a > int. For every int, 10 bytes should be a rough estimation of the size of data > emitted from the Map phase. -- This message was sent by Atlassian JIRA (v6.2#6252)