dengzhhu653 commented on a change in pull request #2473:
URL: https://github.com/apache/hive/pull/2473#discussion_r677143487
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##########
@@ -2572,18 +2569,24 @@ public static ContentSummary getInputSummary(final
Context ctx, MapWork work, Pa
int numExecutors = getMaxExecutorsForInputListing(ctx.getConf(),
pathNeedProcess.size());
if (numExecutors > 1) {
Review comment:
The problem is that the static global block `INPUT_SUMMARY_LOCK` will
block other threads that calling this method. The `INPUT_SUMMARY_LOCK` is
used for avoiding number of threads out of control, but this method does not
use thread pool to get the summaries by default, so the lock can be converted
to a finer-grained lock, which only takes care of the creation and execution of
the thread pool.
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##########
@@ -2680,9 +2683,9 @@ public void run() {
total += estimator.estimate(jobConf, scanOp,
-1).getTotalLength();
}
recordSummary(path, new ContentSummary(total, -1, -1));
- } else {
- // todo: should nullify summary for non-native tables,
- // not to be selected as a mapjoin target
+ } else if (handler == null) {
Review comment:
Thank you @pvary. The pr originally is trying to solve the non-native
table that participates in `join`, such table will be created with a path, but
have no data under it, so the non-native table may be selected as a mapjoin
target, which should cause oom problem when building the hashtable. The changes
take place in one file, so I put them together.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]