[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823597#comment-17823597
 ] 

ASF GitHub Bot commented on DRILL-8479:
---------------------------------------

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512773128


##########
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##########
@@ -297,7 +297,14 @@ public void close() {
       batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
     super.close();
-    leftIterator.close();
+    try {
+      leftIterator.close();
+    } catch (Exception e) {

Review Comment:
   @cgivre 
   In my test case ,it throw  "QueryCancelledException",because some 
minorfragment throw .OutOfMemoryException ,so it inform foreman failed.
   
   foreman send "QueryCancel" commands to other minorfragments. it  throws 
QueryCancelledException after the method "incoming.next()"   called method 
checkContinue() method
   
   Although the "checkContinue" phase throws a fixed "QueryCancelledException" 
message, I am not sure what is causing it (In my test case 
,OutOfMemoryException cause exception)
   
   
    ```
    public void clearInflightBatches() {
       while (lastOutcome == IterOutcome.OK || lastOutcome == 
IterOutcome.OK_NEW_SCHEMA) {
         // Clear all buffers from incoming.
         for (VectorWrapper<?> wrapper : incoming) {
           wrapper.getValueVector().clear();
         }
         lastOutcome = incoming.next();
       }
     }
      
      public void checkContinue() {
         if (!shouldContinue()) {
           throw new QueryCancelledException();
         }
       }
     }
   ```
   
   **stack**
   
   ```
   Caused by: org.apache.drill.exec.ops.QueryCancelledException: null
           at 
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533)
           at 
org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278)
           at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
           at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
           at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165)
           at 
org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359)
           at 
org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365)
           at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301)
   ```





> mergejion memory leak when  exception
> -------------------------------------
>
>                 Key: DRILL-8479
>                 URL: https://issues.apache.org/jira/browse/DRILL-8479
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.21.1
>            Reporter: shihuafeng
>            Priority: Critical
>         Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 1000000/73728/4874240/10000000000 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to