[
https://issues.apache.org/jira/browse/HIVE-26633?focusedWorklogId=817403&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-817403
]
ASF GitHub Bot logged work on HIVE-26633:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 17/Oct/22 00:20
Start Date: 17/Oct/22 00:20
Worklog Time Spent: 10m
Work Description: amansinha100 commented on code in PR #3674:
URL: https://github.com/apache/hive/pull/3674#discussion_r996527079
##########
common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java:
##########
@@ -50,8 +50,21 @@
public class HiveAuthUtils {
private static final Logger LOG =
LoggerFactory.getLogger(HiveAuthUtils.class);
- public static TTransport getSocketTransport(String host, int port, int
loginTimeout) throws TTransportException {
Review Comment:
Since this is a public static method, it is possible it is used by some
client programs and may be needed for backward compatibility. Can we make this
a wrapper and supply the default -1 value ?
##########
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##########
@@ -2913,7 +2913,10 @@ public static enum ConfVars {
HIVE_STATS_MAX_NUM_STATS("hive.stats.max.num.stats", (long) 10000,
"When the number of stats to be updated is huge, this value is used to
control the number of \n" +
" stats to be sent to HMS for update."),
-
+ HIVE_THRIFT_MAX_MESSAGE_SIZE("hive.thrift.max.message.size", "1gb",
Review Comment:
Couple of comments:
- there is also a hive.server2.thrift.max.message.size parameter currently
set to 100MB. However, that appears under the http over thrift transport
settings. The naming can get confusing. I think either we consolidate the 2
settings into one or the previous config should have the 'http' string appear
in the name to avoid conflict. If we keep them separate, it would be useful to
understand whether they should be consistent with each other. I don't have the
full context of http over thrift.
- For the default value, most other size specific settings have the full
bytes value specified e.g 100*1024*1024L for 100MB. There are a few settings
which specify the units as you have done but it seems a lot fewer such
instances.
Issue Time Tracking
-------------------
Worklog Id: (was: 817403)
Time Spent: 0.5h (was: 20m)
> Make thrift max message size configurable
> -----------------------------------------
>
> Key: HIVE-26633
> URL: https://issues.apache.org/jira/browse/HIVE-26633
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2
> Affects Versions: 4.0.0-alpha-2
> Reporter: John Sherman
> Assignee: John Sherman
> Priority: Major
> Labels: pull-request-available
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Since thrift >= 0.14, thrift now enforces max message sizes through a
> TConfiguration object as described here:
> [https://github.com/apache/thrift/blob/master/doc/specs/thrift-tconfiguration.md]
> By default MaxMessageSize gets set to 100MB.
> As a result it is possible for HMS clients not to be able to retrieve certain
> metadata for tables with a large amount of partitions or other metadata.
> For example on a cluster configured with kerberos between hs2 and hms,
> querying a large table (10k partitions, 200 columns with names of 200
> characters) results in this backtrace:
> {code:java}
> org.apache.thrift.transport.TTransportException: MaxMessageSize reached
> at
> org.apache.thrift.transport.TEndpointTransport.countConsumedMessageBytes(TEndpointTransport.java:96)
>
> at
> org.apache.thrift.transport.TMemoryInputTransport.read(TMemoryInputTransport.java:97)
>
> at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:390)
> at
> org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:39)
>
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:109)
> at
> org.apache.hadoop.hive.metastore.security.TFilterTransport.readAll(TFilterTransport.java:63)
>
> at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readByte(TBinaryProtocol.java:329)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readFieldBegin(TBinaryProtocol.java:273)
>
> at
> org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.read(FieldSchema.java:461)
>
> at
> org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.read(FieldSchema.java:454)
>
> at
> org.apache.hadoop.hive.metastore.api.FieldSchema.read(FieldSchema.java:388)
> at
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.read(StorageDescriptor.java:1269)
>
> at
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.read(StorageDescriptor.java:1248)
>
> at
> org.apache.hadoop.hive.metastore.api.StorageDescriptor.read(StorageDescriptor.java:1110)
>
> at
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.read(Partition.java:1270)
>
> at
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.read(Partition.java:1205)
>
> at org.apache.hadoop.hive.metastore.api.Partition.read(Partition.java:1062)
> at
> org.apache.hadoop.hive.metastore.api.PartitionsByExprResult$PartitionsByExprResultStandardScheme.read(PartitionsByExprResult.java:420)
>
> at
> org.apache.hadoop.hive.metastore.api.PartitionsByExprResult$PartitionsByExprResultStandardScheme.read(PartitionsByExprResult.java:399)
>
> at
> org.apache.hadoop.hive.metastore.api.PartitionsByExprResult.read(PartitionsByExprResult.java:335)
>
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_expr_result$get_partitions_by_expr_resultStandardScheme.read(ThriftHiveMetastore.java)
> {code}
> Making this configurable (and defaulting to a higher value) would allow these
> tables to still be accessible.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)