[ 
https://issues.apache.org/jira/browse/HIVE-29516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-29516:
----------------------------------
    Labels: pull-request-available  (was: )

> NullPointerException in StatsUtils.updateStats when column statistics are 
> unavailable during semijoin optimization
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-29516
>                 URL: https://issues.apache.org/jira/browse/HIVE-29516
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor, Statistics
>    Affects Versions: 4.2.0
>            Reporter: Shubham Sharma
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.3.0
>
>
> h3. Problem
> Query compilation fails with {{NullPointerException}} in 
> {{StatsUtils.updateStats()}} when column statistics are not available for 
> certain operators. This occurs during the semijoin optimization phase in 
> {{{}TezCompiler.removeSemijoinOptimizationByBenefit(){}}}.
> The issue is reproducible with TPC-DS queries at scale factors of 100GB or 
> higher, where column-level statistics may be incomplete or unavailable for 
> some tables.
>  
>  
> {code:java}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.updateStats(StatsUtils.java:2067)
>     at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationByBenefit(TezCompiler.java:1982)
>     at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.semijoinRemovalBasedTransformations(TezCompiler.java:539)
>     at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:238)
>     at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:174)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.compilePlan(SemanticAnalyzer.java:12521)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12739)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:460)
>     ... {code}
> h2. How to Reproduce
>  # Generate TPC-DS dataset at 100GB or larger scale
>  # Run TPC-DS queries that involve semijoin optimizations (queries with 
> subqueries or complex joins, eg: 10 17 19 23 24 25 29 32) 
>  # Ensure column statistics are not fully computed for all tables
>  # Observe NPE during query compilation
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to