[
https://issues.apache.org/jira/browse/HIVE-11926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chaoyu Tang updated HIVE-11926:
-------------------------------
Description:
It is because StatsUtils uses the String.startWith to compare VARCHAR/DECIMAL
column type name with serdeConstants which are in lowercase. But these type
names in stats might not be in lower case. We ran into a case where the type
name from TAB_COL_STATS/PART_COL_STATS was actually in uppercase (e.g. VARCHAR,
DECIMAL) because these column stats were populated from other HMS clients like
Impala.
We need changes these type name comparison to be case insensitive
was:
If column stats is calculated and populated to HMS from its client like Impala
etc, the column type name stored in TAB_COL_STATS/PART_COL_STATS could be in
uppercase (e.g. VARCHAR, DECIMAL). When Hive collects stats for these columns
during optimization (with hive.stats.fetch.column.stats set to true), it will
throw out NPE. See error message like below:
{code}
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling
statement: FAILED: NullPointerException null
at
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
at
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:103)
at
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:172)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:379)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:366)
at
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271)
at
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486)
at
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException: null
at
org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:636)
at
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:623)
at
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:180)
at
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:136)
at
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:124)
....truncated
{code}
Summary: Stats annotation might not extract stats for varchar/decimal
columns (was: NPE could occur in collectStatistics when column type is varchar)
Changed the JIRA title and description since NPE won't happen in this version.
> Stats annotation might not extract stats for varchar/decimal columns
> --------------------------------------------------------------------
>
> Key: HIVE-11926
> URL: https://issues.apache.org/jira/browse/HIVE-11926
> Project: Hive
> Issue Type: Bug
> Components: Logical Optimizer, Statistics
> Affects Versions: 1.2.1
> Reporter: Chaoyu Tang
> Assignee: Chaoyu Tang
>
> It is because StatsUtils uses the String.startWith to compare VARCHAR/DECIMAL
> column type name with serdeConstants which are in lowercase. But these type
> names in stats might not be in lower case. We ran into a case where the type
> name from TAB_COL_STATS/PART_COL_STATS was actually in uppercase (e.g.
> VARCHAR, DECIMAL) because these column stats were populated from other HMS
> clients like Impala.
> We need changes these type name comparison to be case insensitive
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)