[ 
https://issues.apache.org/jira/browse/HIVE-22538?focusedWorklogId=377700&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-377700
 ]

ASF GitHub Bot logged work on HIVE-22538:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Jan/20 15:29
            Start Date: 27/Jan/20 15:29
    Worklog Time Spent: 10m 
      Work Description: kasakrisz commented on pull request #877: HIVE-22538: 
RS deduplication does not always enforce 
hive.optimize.reducededuplication.min.reducer
URL: https://github.com/apache/hive/pull/877#discussion_r371307699
 
 

 ##########
 File path: ql/src/test/results/clientpositive/autoColumnStats_4.q.out
 ##########
 @@ -128,16 +127,20 @@ STAGE PLANS:
             Statistics: Num rows: 10 Data size: 1728 Basic stats: COMPLETE 
Column stats: COMPLETE
             Group By Operator
               aggregations: compute_stats(a, 'hll'), compute_stats(b, 'hll')
-              minReductionHashAggr: 0.99
-              mode: hash
+              mode: complete
 
 Review comment:
   Enabling parallelism when inserting eliminates this change.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 377700)
    Time Spent: 1h 10m  (was: 1h)

> RS deduplication does not always enforce 
> hive.optimize.reducededuplication.min.reducer
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-22538
>                 URL: https://issues.apache.org/jira/browse/HIVE-22538
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-22538.2.patch, HIVE-22538.3.patch, 
> HIVE-22538.4.patch, HIVE-22538.5.patch, HIVE-22538.6.patch, 
> HIVE-22538.6.patch, HIVE-22538.7.patch, HIVE-22538.8.patch, HIVE-22538.patch
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> For transactional tables, that property might be overriden to 1, which can 
> lead to merging final aggregation into a single stage (hence leading to 
> performance degradation). For instance, when autogather column stats is 
> enabled, this can happen for the following query:
> {code}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> EXPLAIN
> CREATE TABLE x STORED AS ORC TBLPROPERTIES('transactional'='true') AS
> SELECT * FROM SRC x CLUSTER BY x.key;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to