[
https://issues.apache.org/jira/browse/HIVE-18049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249218#comment-16249218
]
Hive QA commented on HIVE-18049:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12897288/HIVE-18049.3.patch
{color:red}ERROR:{color} -1 due to no test(s) being added or modified.
{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11374 tests
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks]
(batchId=77)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1]
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
(batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb]
(batchId=156)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2]
(batchId=102)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc]
(batchId=94)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi]
(batchId=111)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut
(batchId=206)
org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testApplyPlanQpChanges
(batchId=281)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints
(batchId=223)
{noformat}
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7785/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7785/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7785/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12897288 - PreCommit-HIVE-Build
> Enable Hive on Tez to provide globally sorted clustered table
> -------------------------------------------------------------
>
> Key: HIVE-18049
> URL: https://issues.apache.org/jira/browse/HIVE-18049
> Project: Hive
> Issue Type: Improvement
> Components: Hive, Tez
> Reporter: LingXiao Lan
> Fix For: 2.1.1
>
> Attachments: CombinedPartitioner.txt, HIVE-18049.1.patch,
> tez-0.8.5.txt
>
>
> {code:sql}
> CREATE TABLE `test`(
> `time` int,
> `userid` bigint)
> CLUSTERED BY (
> userid)
> SORTED BY (
> userid ASC)
> INTO 4 BUCKETS
> ;
> {code}
> When insert data into this table, the data will be sorted into 4 buckets
> automatically. But because hive uses hash partitioner by default, the data is
> only sorted in each bucket and isn't sorted among different buckets.
> Sometimes we need the data to be globally sorted, to optimizing indexing, for
> example.
> If we can sample the table first and use TotalOrderPartitioner, this work
> could be done. The difficulty is how do we automatically decide when to use
> TotalOrderPartitioner and when not, because a insertion query can be complex,
> which results in a complex DAG in Tez.
> I have implemented a temporary version. It uses a customer partitioner which
> combines hash partitioner and totalorder partitioner. A physical optimizer is
> added to hive to decide to choose which partitioner. But in order to reduce
> the work load, this version should affect tez source code, which is not
> necessary in fact.
> I'm wondering if we can implement a more common version which addresses this
> issue.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)