belugabehr commented on a change in pull request #1067:
URL: https://github.com/apache/hive/pull/1067#discussion_r436092979
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
##########
@@ -990,24 +955,17 @@ private void flushHashTable(boolean complete) throws
HiveException {
// changed in the future
if (complete) {
- Iterator<Map.Entry<KeyWrapper, AggregationBuffer[]>> iter =
hashAggregations
- .entrySet().iterator();
- while (iter.hasNext()) {
- Map.Entry<KeyWrapper, AggregationBuffer[]> m = iter.next();
- forward(m.getKey().getKeyArray(), m.getValue());
+ for (Map.Entry<KeyWrapper, AggregationBuffer[]> entry :
hashAggregations.entrySet()) {
+ forward(entry.getKey().getKeyArray(), entry.getValue());
}
- hashAggregations.clear();
hashAggregations = null;
- if (LOG.isInfoEnabled()) {
- LOG.info("Hash Table completed flushed");
- }
+ LOG.info("Hash Table completed flushed");
return;
}
int oldSize = hashAggregations.size();
- if (LOG.isInfoEnabled()) {
- LOG.info("Hash Tbl flush: #hash table = " + oldSize);
- }
+ LOG.info("Hash Tbl flush: #hash table = " + oldSize);
Review comment:
Thanks for chiming in @pgaref
I love that you're passionate about using the parameters. I usually like to
do this, but it's 50-50 on who likes to see it and who not. My point is that
using parameters:
> can significantly boost logging performance for _disabled_ logging
statement.
The problem is that it adds overhead when there are _enabled_ logging
statements. For an INFO level logging, it's almost certainly enabled in a
production environment so using the anchor ({}) just adds needless overhead.
Think about it, what is faster?
```
public static void main(String[] args) {
info1("foo: ", "bar");
info2("foo: {}", "bar");
}
void info1(String s, String param) {
System.out.println(s.concat(param));
}
void info2(String s, String param) {
int index = s.lastIndexOf("{");
if (index < 0) {
System.out.println(s);
} else {
if (s.lastIndexOf{"}") == (index + 1) {
String firstHalf= s.subString(0, index);
String secondHalf = s.subString(index + 2) s.length());
System.out.println(firstHalf + param + secondHalf);
} else {
System.out.println(s);
}
}
```
Clearly the second option is going to be slower. `Info1` generates a string
certainly, and `Info2` does not. However, if the application is almost always
going to be running at the INFO level, it doesn't really make a lot of sense to
have to always go down `Info2`.
That said, there is certainly value in having consistency (always using
parameters), and I think messages are easier to read when formatted this way
than when they are all piecemeal with the "+" signs, but if you want to
understand the raw performance, then what is currently there is preferable in
an environment where the log level is set to a minimum of INFO (which almost
every environment I've worked on is).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]