[jira] [Work logged] (HIVE-26633) Make thrift max message size configurable

ASF GitHub Bot (Jira) Sun, 16 Oct 2022 17:21:07 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-26633?focusedWorklogId=817403&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-817403
 ]


ASF GitHub Bot logged work on HIVE-26633:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Oct/22 00:20
            Start Date: 17/Oct/22 00:20
    Worklog Time Spent: 10m 
      Work Description: amansinha100 commented on code in PR #3674:
URL: https://github.com/apache/hive/pull/3674#discussion_r996527079


##########
common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java:
##########
@@ -50,8 +50,21 @@
 public class HiveAuthUtils {
   private static final Logger LOG = 
LoggerFactory.getLogger(HiveAuthUtils.class);
 
-  public static TTransport getSocketTransport(String host, int port, int 
loginTimeout) throws TTransportException {

Review Comment:
   Since this is a public static method, it is possible it is used by some 
client programs and may be needed for backward compatibility.  Can we make this 
a wrapper and supply the default -1 value ?



##########
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##########
@@ -2913,7 +2913,10 @@ public static enum ConfVars {
     HIVE_STATS_MAX_NUM_STATS("hive.stats.max.num.stats", (long) 10000,
         "When the number of stats to be updated is huge, this value is used to 
control the number of \n" +
         " stats to be sent to HMS for update."),
-
+    HIVE_THRIFT_MAX_MESSAGE_SIZE("hive.thrift.max.message.size", "1gb",

Review Comment:
   Couple of comments:
    - there is also a hive.server2.thrift.max.message.size parameter currently 
set to 100MB.  However, that appears under the http over thrift transport 
settings. The naming can get confusing.  I think either we consolidate the 2 
settings into one or the previous config should have the 'http' string appear 
in the name to avoid conflict.  If we keep them separate, it would be useful to 
understand whether they should be consistent with each other.  I don't have the 
full context of http over thrift.   
    - For the default value, most other size specific settings have the full 
bytes value specified e.g 100*1024*1024L for 100MB.  There are a few settings 
which specify the units as you have done but it seems a lot fewer such 
instances.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 817403)
    Time Spent: 0.5h  (was: 20m)

> Make thrift max message size configurable
> -----------------------------------------
>
>                 Key: HIVE-26633
>                 URL: https://issues.apache.org/jira/browse/HIVE-26633
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: John Sherman
>            Assignee: John Sherman
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since thrift >= 0.14, thrift now enforces max message sizes through a 
> TConfiguration object as described here:
> [https://github.com/apache/thrift/blob/master/doc/specs/thrift-tconfiguration.md]
> By default MaxMessageSize gets set to 100MB.
> As a result it is possible for HMS clients not to be able to retrieve certain 
> metadata for tables with a large amount of partitions or other metadata.
> For example on a cluster configured with kerberos between hs2 and hms, 
> querying a large table (10k partitions, 200 columns with names of 200 
> characters) results in this backtrace:
> {code:java}
> org.apache.thrift.transport.TTransportException: MaxMessageSize reached
> at 
> org.apache.thrift.transport.TEndpointTransport.countConsumedMessageBytes(TEndpointTransport.java:96)
>  
> at 
> org.apache.thrift.transport.TMemoryInputTransport.read(TMemoryInputTransport.java:97)
>  
> at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:390) 
> at 
> org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:39)
>  
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:109) 
> at 
> org.apache.hadoop.hive.metastore.security.TFilterTransport.readAll(TFilterTransport.java:63)
>  
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464) 
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readByte(TBinaryProtocol.java:329) 
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readFieldBegin(TBinaryProtocol.java:273)
>  
> at 
> org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.read(FieldSchema.java:461)
>  
> at 
> org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.read(FieldSchema.java:454)
>  
> at 
> org.apache.hadoop.hive.metastore.api.FieldSchema.read(FieldSchema.java:388) 
> at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.read(StorageDescriptor.java:1269)
>  
> at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.read(StorageDescriptor.java:1248)
>  
> at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor.read(StorageDescriptor.java:1110)
>  
> at 
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.read(Partition.java:1270)
>  
> at 
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.read(Partition.java:1205)
>  
> at org.apache.hadoop.hive.metastore.api.Partition.read(Partition.java:1062) 
> at 
> org.apache.hadoop.hive.metastore.api.PartitionsByExprResult$PartitionsByExprResultStandardScheme.read(PartitionsByExprResult.java:420)
>  
> at 
> org.apache.hadoop.hive.metastore.api.PartitionsByExprResult$PartitionsByExprResultStandardScheme.read(PartitionsByExprResult.java:399)
>  
> at 
> org.apache.hadoop.hive.metastore.api.PartitionsByExprResult.read(PartitionsByExprResult.java:335)
>  
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_expr_result$get_partitions_by_expr_resultStandardScheme.read(ThriftHiveMetastore.java)
>   {code}
> Making this configurable (and defaulting to a higher value) would allow these 
> tables to still be accessible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26633) Make thrift max message size configurable

Reply via email to