[
https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sushanth Sowmyan updated HIVE-12083:
------------------------------------
Description:
In the fix for HIVE-10965, there is a short-circuit path that causes an empty
AggrStats object to be returned if partNames is empty or colNames is empty:
{code}
diff --git
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
index 0a56bac..ed810d2 100644
--- metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
+++ metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
@@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
List<String> partNames, List<String> colNames, boolean
useDensityFunctionForNDVEstimation)
throws MetaException {
+ if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); //
Nothing to aggregate.
long partsFound = partsFoundForPartitions(dbName, tableName, partNames,
colNames);
List<ColumnStatisticsObj> colStatsList;
// Try to read from the cache first
{code}
This runs afoul of thrift requirements that AggrStats have required fields:
{code}
struct AggrStats {
1: required list<ColumnStatisticsObj> colStats,
2: required i64 partsFound // number of partitions for which stats were found
}
{code}
Thus, we get errors as follows:
{noformat}
2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer
(TThreadPoolServer.java:run(213)) - Thrift error occurred during processing of
message.
org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is
unset! Struct:AggrStats(colStats:null, partsFound:0)
at
org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
at
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
at
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}
Normally, this would not occur since HIVE-10965 does also include a guard on
the client-side for colNames.isEmpty() to not call the metastore call at all,
but there is no guard for partNames being empty, and would still cause an error
on the metastore side if the thrift call were called directly, as would happen
if the client is from an older version before this was patched.
was:
In the fix for HIVE-10965, there is a short-circuit path that causes an empty
AggrStats object to be returned if partNames is empty or colNames is empty:
{code}
diff --git
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
index 0a56bac..ed810d2 100644
--- metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
+++ metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
@@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
List<String> partNames, List<String> colNames, boolean
useDensityFunctionForNDVEstimation)
throws MetaException {
+ if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); //
Nothing to aggregate.
long partsFound = partsFoundForPartitions(dbName, tableName, partNames,
colNames);
List<ColumnStatisticsObj> colStatsList;
// Try to read from the cache first
{code}
This runs afoul of thrift requirements that AggrStats have required fields:
{code}
struct AggrStats {
1: required list<ColumnStatisticsObj> colStats,
2: required i64 partsFound // number of partitions for which stats were found
}
{code}
Thus, we get errors as follows:
{noformat}
2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer
(TThreadPoolServer.java:run(213)) - Thrift error occurred during processing of
message.
org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is
unset! Struct:AggrStats(colStats:null, partsFound:0)
at
org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
at
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
at
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}
Normally, this would not occur since HIVE-10965 does also include a guard on
the client-side for colNames.isEmpty() to not call the metastore call at all,
but there is no guard for partNames being empty, and would still cause an error
on the metastore side if the thrift call were called directly, as would happen
if the client is from an odler version before this was patched.
> HIVE-10965 introduces thrift error if partNames or colNames are empty
> ---------------------------------------------------------------------
>
> Key: HIVE-12083
> URL: https://issues.apache.org/jira/browse/HIVE-12083
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Reporter: Sushanth Sowmyan
> Assignee: Sushanth Sowmyan
>
> In the fix for HIVE-10965, there is a short-circuit path that causes an empty
> AggrStats object to be returned if partNames is empty or colNames is empty:
> {code}
> diff --git
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> index 0a56bac..ed810d2 100644
> ---
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> +++
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
> public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
> List<String> partNames, List<String> colNames, boolean
> useDensityFunctionForNDVEstimation)
> throws MetaException {
> + if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats();
> // Nothing to aggregate.
> long partsFound = partsFoundForPartitions(dbName, tableName, partNames,
> colNames);
> List<ColumnStatisticsObj> colStatsList;
> // Try to read from the cache first
> {code}
> This runs afoul of thrift requirements that AggrStats have required fields:
> {code}
> struct AggrStats {
> 1: required list<ColumnStatisticsObj> colStats,
> 2: required i64 partsFound // number of partitions for which stats were found
> }
> {code}
> Thus, we get errors as follows:
> {noformat}
> 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer
> (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing
> of message.
> org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is
> unset! Struct:AggrStats(colStats:null, partsFound:0)
> at
> org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Normally, this would not occur since HIVE-10965 does also include a guard on
> the client-side for colNames.isEmpty() to not call the metastore call at all,
> but there is no guard for partNames being empty, and would still cause an
> error on the metastore side if the thrift call were called directly, as would
> happen if the client is from an older version before this was patched.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)