[GitHub] drill issue #905: DRILL-1162: Fix OOM for hash join operator when the right ...

vvysotskyi Wed, 13 Sep 2017 05:58:51 -0700

Github user vvysotskyi commented on the issue:

    https://github.com/apache/drill/pull/905
  
    @jinfengni, the idea is very interesting for me. More appropriate memory 
cost estimation will cause to choosing better plans, that will cause the less 
time execution and memory costs. 
    
    Writing about the comparing the ratio of column count vs ratio of row count 
I meant that the multiplying of row count and column count does not help to 
choose which input needs more memory in the case when the ratio of **actual** 
row count is much greater than the ratio of column count. 
`getCumulativeMemCost` implementation will not help for this case. 
    To show that, let's consider an example. We have three tables: 
    
    - a(has 5 columns and 5 rows with the same values); 
    - b(has 5 columns and 10 rows with the same values as in the table a); 
    - c(has 5 columns and 35 rows with the same values as in the table a). 
    
    For the query
    ```
    select count(*) from a
    inner join b on a.col1=b.col1
    inner join c on b.col1=c.col1;
    ```
    Drill will build plan
    ```
        HahJoin[1]
        /    \
           c   HahJoin[2]
           /    \
          b      a
    ```
    `getCumulativeMemCost` for HahJoin[1] will return a value proportional to
    `(HahJoin[2] row count) * (HahJoin[2] column count) + 
HahJoin[2].getCumulativeMemCost() = Max(aRowCount, bRowCount) * (aColumnCount + 
bColumnCount) +  aRowCount * aColumnCount = 5 * (5 + 5) + 5 * 5 = 75`.
    Actual row count for build side of HahJoin[1] in this case is 50.
    For the plan, that will be more suitable for this particular case:
    ```
        HahJoin[1]
        /     \
      HahJoin[2]   c   
       /    \
      b  a
    ```
    `getCumulativeMemCost` for HahJoin[1] will return a value proportional to 
`cRowCount * cColumnCount = 35 * 5 = 175`.
    Actual row count for build side of HahJoin[1] in this case is 35.
    
    Smal information about this pull request.
    This pull request addresses only the case of large row count. It checks 
that OOM may happen and if swap allows avoiding this potential OOM, the swap 
will happen.

---

[GitHub] drill issue #905: DRILL-1162: Fix OOM for hash join operator when the right ...

Reply via email to