[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

Hive QA (JIRA) Fri, 01 Sep 2017 13:29:27 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151097#comment-16151097
 ]


Hive QA commented on HIVE-16811:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885000/HIVE-16811.16.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 11024 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_smb_mapjoin_14]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_10]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_9]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez2]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_7]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization2]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[semijoin] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_mapjoin_14]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_mapjoin_6]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_auto_smb_mapjoin_14]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_leftsemi_mapjoin]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_multi_insert]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=99)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vector_non_string_partition]
 (batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6645/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6645/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6645/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885000 - PreCommit-HIVE-Build

> Estimate statistics in absence of stats
> ---------------------------------------
>
>                 Key: HIVE-16811
>                 URL: https://issues.apache.org/jira/browse/HIVE-16811
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vineet Garg
>            Assignee: Vineet Garg
>         Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.1.patch, 
> HIVE-16811.2.patch, HIVE-16811.3.patch, HIVE-16811.4.patch, 
> HIVE-16811.5.patch, HIVE-16811.6.patch, HIVE-16811.7.patch, 
> HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY      INT,
>                                 L_PARTKEY       INT,
>                                 L_SUPPKEY       INT,
>                                 L_LINENUMBER    INT,
>                                 L_QUANTITY      DOUBLE,
>                                 L_EXTENDEDPRICE DOUBLE,
>                                 L_DISCOUNT      DOUBLE,
>                                 L_TAX           DOUBLE,
>                                 L_RETURNFLAG    STRING,
>                                 L_LINESTATUS    STRING,
>                                 l_shipdate      STRING,
>                                 L_COMMITDATE    STRING,
>                                 L_RECEIPTDATE   STRING,
>                                 L_SHIPINSTRUCT  STRING,
>                                 L_SHIPMODE      STRING,
>                                 L_COMMENT       STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
>     p_partkey INT,
>     p_name STRING,
>     p_mfgr STRING,
>     p_brand STRING,
>     p_type STRING,
>     p_size INT,
>     p_container STRING,
>     p_retailprice DOUBLE,
>     p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

Reply via email to