[jira] [Work logged] (HIVE-25192) Nullify summary for non-native tables

ASF GitHub Bot (Jira) Mon, 26 Jul 2021 23:05:08 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25192?focusedWorklogId=628173&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-628173
 ]


ASF GitHub Bot logged work on HIVE-25192:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Jul/21 06:04
            Start Date: 27/Jul/21 06:04
    Worklog Time Spent: 10m 
      Work Description: dengzhhu653 commented on a change in pull request #2473:
URL: https://github.com/apache/hive/pull/2473#discussion_r677143487



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##########
@@ -2572,18 +2569,24 @@ public static ContentSummary getInputSummary(final 
Context ctx, MapWork work, Pa
 
       int numExecutors = getMaxExecutorsForInputListing(ctx.getConf(), 
pathNeedProcess.size());
       if (numExecutors > 1) {

Review comment:
       The problem is that the static global block `INPUT_SUMMARY_LOCK` will 
block other threads that calling this method.  The `INPUT_SUMMARY_LOCK`  is 
used for avoiding number of threads out of control, but this method does not 
use thread pool to get the summaries by default, so the lock can be converted 
to a finer-grained lock, which only takes care of the creation and execution of 
the thread pool. 

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##########
@@ -2680,9 +2683,9 @@ public void run() {
                   total += estimator.estimate(jobConf, scanOp, 
-1).getTotalLength();
                 }
                 recordSummary(path, new ContentSummary(total, -1, -1));
-              } else {
-                // todo: should nullify summary for non-native tables,
-                // not to be selected as a mapjoin target
+              } else if (handler == null) {

Review comment:
       Thank you @pvary.  The pr originally is trying to solve the non-native 
table that participates in `join`, such table will be created with a path, but 
have no data under it, so the non-native table may be selected as a mapjoin 
target, which should cause oom problem when building the hashtable. The changes 
take place in one file, so I put them together.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 628173)
    Time Spent: 1h 50m  (was: 1h 40m)

> Nullify summary for non-native tables
> -------------------------------------
>
>                 Key: HIVE-25192
>                 URL: https://issues.apache.org/jira/browse/HIVE-25192
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Zhihua Deng
>            Assignee: Zhihua Deng
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When creating non-native tables like kudu, hbase and so on,  we create a 
> warehouse location for these tables, though these tables may not use the 
> location to store data or for job plan, we should skip getting the input 
> summary of non-native tables when optimising joins,  as which may cause oom 
> problem when the non-native table is on the build side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25192) Nullify summary for non-native tables

Reply via email to