Jinfeng Ni created DRILL-326:
--------------------------------

             Summary: Multi-table merge join hit IOBE; merge join may 
over-allocate memory
                 Key: DRILL-326
                 URL: https://issues.apache.org/jira/browse/DRILL-326
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Jinfeng Ni
            Assignee: Jinfeng Ni


The following query which joins 4 tables could hit  IndexOutOfBoundsException. 

message: "Failure while running fragment. < IndexOutOfBoundsException:[ index: 
2205764, length: 4 (expected: range(0, 2205764)) 

SELECT S.S_ACCTBAL, S.S_NAME, N.N_NAME
FROM
 ( SELECT _MAP['P_PARTKEY'] as P_PARTKEY,
          _MAP['P_MFGR'] as P_MFGR
   FROM "/Users/jni//work/tpc-h-parquet/part") P,
 ( SELECT _MAP['S_SUPPKEY'] AS S_SUPPKEY,
          _MAP['S_NATIONKEY'] AS S_NATIONKEY,
          _MAP['S_ACCTBAL'] AS S_ACCTBAL,
          _MAP['S_NAME']  AS S_NAME,
          _MAP['S_ADDRESS'] AS S_ADDRESS,
          _MAP['S_PHONE'] AS S_PHONE,
          _MAP['S_COMMENT'] AS S_COMMENT
   FROM "/Users/jni//work/tpc-h-parquet/supplier") S,
 (SELECT _MAP['PS_PARTKEY'] AS PS_PARTKEY,
         _MAP['PS_SUPPKEY'] AS PS_SUPPKEY
  FROM "/Users/jni//work/tpc-h-parquet/partsupp") PS,
 ( SELECT  CAST(_MAP['N_NAME'] AS VARCHAR) AS N_NAME,
           _MAP['N_NATIONKEY'] AS N_NATIONKEY
   FROM "/Users/jni//work/tpc-h-parquet/nation" ) N
WHERE P.P_PARTKEY  = PS.PS_PARTKEY and
      S.S_SUPPKEY = PS.PS_SUPPKEY and
      S.S_NATIONKEY = N.N_NATIONKEY
LIMIT 100;

The root cause of this IOBE is that merge join continue to increase the output 
position, even if the copy from left or right input fails. This would cause the 
merge join batch size to exceed the buffer capacity, and hence hit IOBE in the 
downstream batch processing.

This bug also exposes another two issues. 

1) we need a way to verify that each batch size is within the 65535 limit.  
This will make it easier to debug similar problem in the future, since if 
certain code bug causes the batch size goes beyond the limit, we could catch 
such issue right away, in stead of continue the execution, and hit error in 
downstream batch processing.
 
2) merge join batch may allocate buffer using different row count for value 
vectors copying from the left and right.  In join operation, this should be 
equal. Using different row count could lead unnecessary memory overhead. Also, 
the merge join batch size should be bounded by the limit.




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to