jfsii commented on code in PR #3674:
URL: https://github.com/apache/hive/pull/3674#discussion_r997240526


##########
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##########
@@ -2913,7 +2913,10 @@ public static enum ConfVars {
     HIVE_STATS_MAX_NUM_STATS("hive.stats.max.num.stats", (long) 10000,
         "When the number of stats to be updated is huge, this value is used to 
control the number of \n" +
         " stats to be sent to HMS for update."),
-
+    HIVE_THRIFT_MAX_MESSAGE_SIZE("hive.thrift.max.message.size", "1gb",

Review Comment:
   For point 1 - I looked into the hive.server2.thrift.max.message.size - it 
has nothing to do with http. I wouldn't trust the grouping comments in 
HiveConf. It has been edited by so many developers with different styles it 
tends to be a bit messy imo. The "hive.server2.thrift.max.message.size" setting 
ends up getting used here:
   
https://github.com/apache/hive/blob/d17bd5bf12e62563cc35016dbb4445c89333d9a3/service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java#L102
   which then gets passed to this constructor in thrift:
   
https://github.com/apache/thrift/blob/2a93df80f27739ccabb5b885cb12a8dc7595ecdf/lib/java/src/org/apache/thrift/protocol/TBinaryProtocol.java#L103
   
   So it ends up being used to define these two limits for the binary protocol:
   ```/**
      * The maximum number of bytes to read from the transport for
      * variable-length fields (such as strings or binary) or {@link 
#NO_LENGTH_LIMIT} for
      * unlimited.
      */
     private final long stringLengthLimit_;
   
     /**
      * The maximum number of elements to read from the network for
      * containers (maps, sets, lists), or {@link #NO_LENGTH_LIMIT} for 
unlimited.
      */
     private final long containerLengthLimit_;
   ```
   
   I would argue the name "hive.server2.thrift.max.message.size" is itself 
misnamed, however we are stuck with it since it has been in the wild for so 
long. I am not sure it would be a good idea to use the same setting for 3 
different thrift limits.
   
   I am open to an alternative name for my setting though 
"hive.thrift.max.message.size" because it is confusing with the other named 
setting. If you think my above concerns about re-using the same setting for 
stringLengthLimit, containerLengthLimit and overall thrift client bytes 
received limit are overblown, I could be swayed.
   
   For point 2 - I'd argue my way is more consistent. I am using a 
SizeValidator() which from my quick check most of the settings that use a size 
validator, the default is a size with a unit. I'd also argue this is more 
user-friendly. The usages of SizeValidator() that do not specify a unit look to 
be using SizeValidator() incorrectly. For example - "hive.mv.files.thread" is 
not a size. I could not use SizeValidator and just accept an integer but I 
think that is removing a user friend way to set settings for something that is 
more engineer oriented.
   
   Both of the above points I can be swayed, I'm just sharing my thought 
process.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to