John LeBrun created HIVE-21853:
----------------------------------
Summary: NPE in StatsUtils.getWritableSize() when value passed in
is null
Key: HIVE-21853
URL: https://issues.apache.org/jira/browse/HIVE-21853
Project: Hive
Issue Type: Bug
Environment: Hortonworks
* Ambari version 2.7.3.0
* HDP stack version 3.1
* HDP stack repo version 3.1.0.0
* stack vdf version 3.1.0.0-78
Reporter: John LeBrun
getWritableSize(ObjectInspector oi, Object value) method in
org.apache.hadoop.hive.ql.stats.StatsUtils class fails with NPE when 2nd
parameter (Object value) is null.
Attached is patch with unit test and fix
Issue was originally found when running UDF query against Hortonworks cluster
with HDP 3.1 running Hive 3.1.0. The issue occurs when executing the UDF
against a cluster using the tez execution engine
beeline hive configurations
set hive.execution.engine=tez;
set hive.fetch.task.conversion=none;
Attached is sample code with an implementation of a simple UDF that duplicates
the behavior.
steps to reproduce
on a Hortonworks cluster with HDP 3.1 deployed
-start beeline Hive session
-set above hive configurations
-add jar containing UDF from sample code
-create table containing one string column
create table tmptable(col1 string)
insert into table tmptable values ('somestring')
-create function bugUdf as 'BugUDF';
-select bugUdf from tmptable;
this will result in a null pointer exception similar to this
ql.Driver ()) - FAILED: NullPointerException nulljava.lang.NullPointerException
at
org.apache.hadoop.hive.ql.stats.StatsUtils.getWritableSize(StatsUtils.java:1373)
at
org.apache.hadoop.hive.ql.stats.StatsUtils.getSizeOfStruct(StatsUtils.java:1356)
at
org.apache.hadoop.hive.ql.stats.StatsUtils.getSizeOfComplexTypes(StatsUtils.java:1212)
at
org.apache.hadoop.hive.ql.stats.StatsUtils.getAvgColLenOf(StatsUtils.java:1140)
at
org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExpression(StatsUtils.java:1584)
at
org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExprMap(StatsUtils.java:1424)
at
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$SelectStatsRule.process(StatsRulesProcFactory.java:196)
at
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
at
org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
at
org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:397)
at
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:161)
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:148)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12443)
at
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:358)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:664)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1863)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1810)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1805)
at
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:197)
at
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:247)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)
at
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
at
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562)
at
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
at
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)