[jira] [Commented] (HIVE-20260) NDV of a column shouldn't be scaled when row count is changed by filter on another column

Hive QA (JIRA) Wed, 01 Aug 2018 11:18:50 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-20260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565758#comment-16565758
 ]


Hive QA commented on HIVE-20260:
--------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
 4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
37s{color} | {color:blue} ql in master has 2301 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
40s{color} | {color:red} ql: The patch generated 3 new + 21 unchanged - 28 
fixed = 24 total (was 49) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 4 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  5m  
0s{color} | {color:red} ql generated 7 new + 2294 unchanged - 7 fixed = 2301 
total (was 2301) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Boxing/unboxing to parse a primitive 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
 AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long)  At 
StatsRulesProcFactory.java:org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
 AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long)  At 
StatsRulesProcFactory.java:[line 935] |
|  |  Boxing/unboxing to parse a primitive 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
 AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long)  At 
StatsRulesProcFactory.java:org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
 AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long)  At 
StatsRulesProcFactory.java:[line 956] |
|  |  
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
 AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long) invokes inefficient new 
Byte(String) constructor; use Byte.valueOf(String) instead  At 
StatsRulesProcFactory.java:inefficient new Byte(String) constructor; use 
Byte.valueOf(String) instead  At StatsRulesProcFactory.java:[line 891] |
|  |  
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
 AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long) invokes inefficient new 
Integer(String) constructor; use Integer.valueOf(String) instead  At 
StatsRulesProcFactory.java:inefficient new Integer(String) constructor; use 
Integer.valueOf(String) instead  At StatsRulesProcFactory.java:[line 935] |
|  |  
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
 AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long) invokes inefficient new 
Long(String) constructor; use Long.valueOf(String) instead  At 
StatsRulesProcFactory.java:inefficient new Long(String) constructor; use 
Long.valueOf(String) instead  At StatsRulesProcFactory.java:[line 956] |
|  |  
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
 AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long) invokes inefficient new 
Short(String) constructor; use Short.valueOf(String) instead  At 
StatsRulesProcFactory.java:inefficient new Short(String) constructor; use 
Short.valueOf(String) instead  At StatsRulesProcFactory.java:[line 910] |
|  |  Comparison of String objects using == or != in 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
 AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long)   At 
StatsRulesProcFactory.java:== or != in 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateComparator(Statistics,
 AnnotateStatsProcCtx, ExprNodeGenericFuncDesc, long)   At 
StatsRulesProcFactory.java:[line 931] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-12984/dev-support/hive-personality.sh
 |
| git revision | master / 4d43695 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-12984/yetus/diff-checkstyle-ql.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-12984/yetus/whitespace-eol.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-12984/yetus/whitespace-tabs.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-12984/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-12984/yetus.txt |
| Powered by | Apache Yetus    http://yetus.apache.org |


This message was automatically generated.



> NDV of a column shouldn't be scaled when row count is changed by filter on 
> another column
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-20260
>                 URL: https://issues.apache.org/jira/browse/HIVE-20260
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>            Reporter: Ashutosh Chauhan
>            Assignee: Zoltan Haindrich
>            Priority: Major
>         Attachments: HIVE-20260.01.patch, HIVE-20260.01wip01.patch, 
> HIVE-20260.01wip02.patch, HIVE-20260.01wip03.patch
>
>
> HIVE-17465 introduced progressive scaling of rowcounts in presence of 
> multiple filters. HIVE-19500 improved on that by also scaling col stats (NDV) 
> in such scenario. However, it should pay attention to column used in filter 
> expression and not scale for all filters. eg.,
> consider filter a = 1 and b = 2 ndv of column b should not be scaled down by 
> row count changes caused by a = 1
> Other way to say this that ndv of a particular column should be updated at 
> the end of computation of row count for that operator.
> Here are the possible cases where our estimates can be accurate (or close to)
> {code}
> case 1 - (d_year = 2001 and d_moy=1)
> case 2 - (d_year = 2001 and d_year IN (2001, 2002))
> case 3 - (d_year = 2001 and d_moy = 1 and d_dom = 1)
> case 4 - (d_date IN ('1999-01-02', '1999-01-02'))
> case 5 - (d_date = '1999-01-01')
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20260) NDV of a column shouldn't be scaled when row count is changed by filter on another column

Reply via email to