[ 
https://issues.apache.org/jira/browse/HIVE-27082?focusedWorklogId=857361&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-857361
 ]

ASF GitHub Bot logged work on HIVE-27082:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Apr/23 12:34
            Start Date: 17/Apr/23 12:34
    Worklog Time Spent: 10m 
      Work Description: sonarcloud[bot] commented on PR #4239:
URL: https://github.com/apache/hive/pull/4239#issuecomment-1511256265

   Kudos, SonarCloud Quality Gate passed!    [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4239)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4239&resolved=false&types=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4239&resolved=false&types=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4239&resolved=false&types=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4239&resolved=false&types=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4239&resolved=false&types=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4239&resolved=false&types=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4239&resolved=false&types=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4239&resolved=false&types=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4239&resolved=false&types=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4239&resolved=false&types=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4239&resolved=false&types=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4239&resolved=false&types=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4239&metric=coverage&view=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4239&metric=duplicated_lines_density&view=list)
 No Duplication information
   
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 857361)
    Time Spent: 20m  (was: 10m)

> AggregateStatsCache.findBestMatch() in Metastore should test the inclusion of 
> default partition name
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-27082
>                 URL: https://issues.apache.org/jira/browse/HIVE-27082
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>    Affects Versions: 3.1.3, 4.0.0-alpha-2
>            Reporter: Sungwoo Park
>            Assignee: Seonggon Namgung
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> This JIRA deals with non-determinisitic behavior of Hive in generating DAGs.
> The non-determinstic behavior of Hive in generating DAGs is due to the logic 
> in AggregateStatsCache.findBestMatch() called from AggregateStatsCache.get(), 
> as well as the disproportionate distribution of Nulls in 
> HIVE_DEFAULT_PARTITION.
> Here is what is happening in the case of the TPC-DS dataset. Let us use 
> web_sales table and ws_web_site_sk column in the 10TB TPC-DS dataset as a 
> running example.
> In the course of running TPC-DS queries, Hive asks MetaStore about the column 
> statistics of 1823 partNames in the web_sales/ws_web_site_sk combination, 
> either without HIVE_DEFAULT_PARTITION or with HIVE_DEFAULT_PARTITION.
> --- Without HIVE_DEFAULT_PARTITION, it reports a total of 901180 nulls.
> --- With HIVE_DEFAULT_PARTITION, however, it report a total of 1800087 nulls, 
> almost twice as many.
> The first call to MetaStore returns the correct result, but all subsequent 
> requests are likely to return the same result from the cache, irrespective of 
> the inclusion of HIVE_DEFAULT_PARTITION. This is because 
> AggregateStatsCache.findBestMatch() treats HIVE_DEFAULT_PARTITION in the same 
> way as other partNames, and the difference in the size of partNames[] is just 
> 1. The outcome depends on the duration of intervening queries, so everything 
> is now non-deterministic.
> If a wrong value of numNulls is returned, Hive generates a different DAG 
> which make takes much longer than the correct one. The problem is 
> particularly pronounced here because of the huge number of nulls in 
> HIVE_DEFAULT_PARTITION. It is ironic to see that the query optimizer is so 
> efficient that a single wrong guess of numNulls creates a very inefficient 
> DAG.
> Note that this behavior cannot be avoided by setting 
> hive.metastore.aggregate.stats.cache.max.variance to zero because the 
> difference in the number of partNames[] between the argument and the entry in 
> the cache is just 1.
> So, AggregateStatsCache.findBestMatch() should treat HIVE_DEFAULT_PARTITION 
> in a special way, by not returning the result in the cache if there is a 
> difference in the inclusion of partName HIVE_DEFAULT_PARTITION (or should 
> provide the use with an option to activate this feature).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to