jfsii commented on code in PR #3674:
URL: https://github.com/apache/hive/pull/3674#discussion_r997240526
##########
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##########
@@ -2913,7 +2913,10 @@ public static enum ConfVars {
HIVE_STATS_MAX_NUM_STATS("hive.stats.max.num.stats", (long) 10000,
"When the number of stats to be updated is huge, this value is used to
control the number of \n" +
" stats to be sent to HMS for update."),
-
+ HIVE_THRIFT_MAX_MESSAGE_SIZE("hive.thrift.max.message.size", "1gb",
Review Comment:
For point 1 - I looked into the hive.server2.thrift.max.message.size - it
has nothing to do with http. I wouldn't trust the grouping comments in
HiveConf. It has been edited by so many developers with different styles it
tends to be a bit messy imo. The "hive.server2.thrift.max.message.size" setting
ends up getting used here:
https://github.com/apache/hive/blob/d17bd5bf12e62563cc35016dbb4445c89333d9a3/service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java#L102
which then gets passed to this constructor in thrift:
https://github.com/apache/thrift/blob/2a93df80f27739ccabb5b885cb12a8dc7595ecdf/lib/java/src/org/apache/thrift/protocol/TBinaryProtocol.java#L103
So it ends up being used to define these two limits for the binary protocol:
```/**
* The maximum number of bytes to read from the transport for
* variable-length fields (such as strings or binary) or {@link
#NO_LENGTH_LIMIT} for
* unlimited.
*/
private final long stringLengthLimit_;
/**
* The maximum number of elements to read from the network for
* containers (maps, sets, lists), or {@link #NO_LENGTH_LIMIT} for
unlimited.
*/
private final long containerLengthLimit_;
```
I would argue the name "hive.server2.thrift.max.message.size" is itself
misnamed, however we are stuck with it since it has been in the wild for so
long. I am not sure it would be a good idea to use the same setting for 3
different thrift limits.
I am open to an alternative name for my setting though
"hive.thrift.max.message.size" because it is confusing with the other named
setting. If you think my above concerns about re-using the same setting for
stringLengthLimit, containerLengthLimit and overall thrift client bytes
received limit are overblown, I could be swayed.
For point 2 - I'd argue my way is more consistent. I am using a
SizeValidator() which from my quick check most of the settings that use a size
validator, the default is a size with a unit. I'd also argue this is more
user-friendly. The usages of SizeValidator() that do not specify a unit look to
be using SizeValidator() incorrectly. For example - "hive.mv.files.thread" is
not a size. I could not use SizeValidator and just accept an integer but I
think that is removing a user friend way to set settings for something that is
more engineer oriented.
Both of the above points I can be swayed, I'm just sharing my thought
process.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]