[ 
https://issues.apache.org/jira/browse/HIVE-11926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-11926:
-------------------------------
    Description: 
It is because StatsUtils uses the String.startWith to compare VARCHAR/DECIMAL 
column type name with serdeConstants which are in lowercase.  But these type 
names in stats might not be in lower case. We ran into a case where the type 
name from TAB_COL_STATS/PART_COL_STATS was actually in uppercase (e.g. VARCHAR, 
DECIMAL) because these column stats were populated from other HMS clients like 
Impala.
We need changes these type name comparison to be case insensitive

  was:
If column stats is calculated and populated to HMS from its client like Impala 
etc, the column type name stored in TAB_COL_STATS/PART_COL_STATS could be in 
uppercase (e.g. VARCHAR, DECIMAL). When Hive collects stats for these columns 
during optimization (with hive.stats.fetch.column.stats set to true), it will 
throw out NPE. See error message like below:
{code}
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling 
statement: FAILED: NullPointerException null
at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:103)
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:172)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:379)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:366)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException: null
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:636)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:623)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:180)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:136)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:124)
....truncated
{code}

        Summary: Stats annotation might not extract stats for varchar/decimal 
columns  (was: NPE could occur in collectStatistics when column type is varchar)

Changed the JIRA title and description since NPE won't happen in this version. 

> Stats annotation might not extract stats for varchar/decimal columns
> --------------------------------------------------------------------
>
>                 Key: HIVE-11926
>                 URL: https://issues.apache.org/jira/browse/HIVE-11926
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer, Statistics
>    Affects Versions: 1.2.1
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>
> It is because StatsUtils uses the String.startWith to compare VARCHAR/DECIMAL 
> column type name with serdeConstants which are in lowercase.  But these type 
> names in stats might not be in lower case. We ran into a case where the type 
> name from TAB_COL_STATS/PART_COL_STATS was actually in uppercase (e.g. 
> VARCHAR, DECIMAL) because these column stats were populated from other HMS 
> clients like Impala.
> We need changes these type name comparison to be case insensitive



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to