[
https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150757#comment-14150757
]
Hive QA commented on HIVE-8196:
-------------------------------
{color:red}Overall{color}: -1 at least one tests failed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671549/HIVE-8196.2.patch
{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6359 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
{noformat}
Test results:
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1016/testReport
Console output:
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1016/console
Test logs:
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1016/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12671549
> Joining on partition columns with fetch column stats enabled results it very
> small CE which negatively affects query performance
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-8196
> URL: https://issues.apache.org/jira/browse/HIVE-8196
> Project: Hive
> Issue Type: Bug
> Components: Physical Optimizer
> Affects Versions: 0.14.0
> Reporter: Mostafa Mokhtar
> Assignee: Prasanth J
> Priority: Blocker
> Labels: performance
> Fix For: 0.14.0
>
> Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch
>
>
> To make the best out of dynamic partition pruning joins should be on the
> partitioning columns which results in dynamically pruning the partitions from
> the fact table based on the qualifying column keys from the dimension table,
> this type of joins negatively effects on cardinality estimates with fetch
> column stats enabled.
> Currently we don't have statistics for partition columns and as a result NDV
> is set to row count, doing that negatively affects the estimated join
> selectivity from the join.
> Workaround is to capture statistics for partition columns or use number of
> partitions incase dynamic partitioning is used.
> In StatsUtils.getColStatisticsFromExpression is where count distincts gets
> set to row count
> {code}
> if (encd.getIsPartitionColOrVirtualCol()) {
> // vitual columns
> colType = encd.getTypeInfo().getTypeName();
> countDistincts = numRows;
> oi = encd.getWritableObjectInspector();
> {code}
> Query used to repro the issue :
> {code}
> set hive.stats.fetch.column.stats=true;
> set hive.tez.dynamic.partition.pruning=true;
> explain select d_date
> from store_sales, date_dim
> where
> store_sales.ss_sold_date_sk = date_dim.d_date_sk and
> date_dim.d_year = 1998;
> {code}
> Plan
> {code}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Map 1 <- Map 2 (BROADCAST_EDGE)
> DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: store_sales
> filterExpr: ss_sold_date_sk is not null (type: boolean)
> Statistics: Num rows: 550076554 Data size: 47370018816
> Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {ss_sold_date_sk}
> 1 {d_date_sk} {d_date}
> keys:
> 0 ss_sold_date_sk (type: int)
> 1 d_date_sk (type: int)
> outputColumnNames: _col22, _col26, _col28
> input vertices:
> 1 Map 2
> Statistics: Num rows: 652 Data size: 66504 Basic stats:
> COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (_col22 = _col26) (type: boolean)
> Statistics: Num rows: 326 Data size: 33252 Basic stats:
> COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col28 (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 326 Data size: 30644 Basic
> stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 326 Data size: 30644 Basic
> stats: COMPLETE Column stats: COMPLETE
> table:
> input format:
> org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Execution mode: vectorized
> Map 2
> Map Operator Tree:
> TableScan
> alias: date_dim
> filterExpr: (d_date_sk is not null and (d_year = 1998))
> (type: boolean)
> Statistics: Num rows: 73049 Data size: 81741831 Basic
> stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (d_date_sk is not null and (d_year = 1998))
> (type: boolean)
> Statistics: Num rows: 652 Data size: 66504 Basic stats:
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: d_date_sk (type: int)
> sort order: +
> Map-reduce partition columns: d_date_sk (type: int)
> Statistics: Num rows: 652 Data size: 66504 Basic stats:
> COMPLETE Column stats: COMPLETE
> value expressions: d_date (type: string)
> Select Operator
> expressions: d_date_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 652 Data size: 0 Basic stats:
> PARTIAL Column stats: COMPLETE
> Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 652 Data size: 0 Basic stats:
> PARTIAL Column stats: COMPLETE
> Dynamic Partitioning Event Operator
> Target Input: store_sales
> Partition key expr: ss_sold_date_sk
> Statistics: Num rows: 652 Data size: 0 Basic stats:
> PARTIAL Column stats: COMPLETE
> Target column: ss_sold_date_sk
> Target Vertex: Map 1
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)