GitHub user Ben-Zvi opened a pull request:

    https://github.com/apache/drill/pull/871

    DRILL-5616: Add memory checks, plus minor metrics changes

    This PR addresses the problem of OOM when handling irregular input (e.g., 
grouping keys' sizes varies from 8 to 250 bytes long), mainly by adding memory 
checks (to trigger spilling more often). This PR also contains some other minor 
changes. Detail:
    
    (1) Added a flag `needToCheckIfSpillIsNeeded` in 
`checkGroupAndAggrValues()` that is set true whenever we suspect a memory 
pressure. The cases are:
    (1.a) When the HashTable put() returned KEY_ADDED_LAST (like the prior code 
did).
    (1.b) When the put() allocated memory, other than when adding a new batch. 
(E.g., hash table doubling, buffer reallocation, etc).
    (1.c) When put() allocated a new batch, then we allocated a matching aggr 
batch, and the total memory allocated was larger than the estimated max batch 
size.
    
    (2) With the above change, no longer need to compute space for "all hash 
tables doubling" (as we'll check after each doubling). So removed 
`extraMemoryNeededForResize()` from the HashTable. Instead just calculating the 
Max hash table double, in case one doubles. (Calculate by multiplying sizes etc 
- see line 1243 in HashAggTemplate).
    
    (3) Also changed  `plannedBatches`  to  `Math.max(1, plannedBatches)` 
because now the memory check may be called when no new batch is planned.
    
    (4) When calculating the possible number of partitions (in delayedSetup(), 
line 401), reduce the overhead factor from 8M to 2M, as now we check memory 
more often. (This will allow more partitions).
    
    (5) In doWork(): Before the estimated batch size was updated when the 
next() incoming produced a bigger input batch. Now changed this to also apply 
for next() from the spill file. The reason was this crazy test case, when the 
first batch from spill was 5M, then the next batch was 27M !!  This fix is not 
perfect, we could still have OOMed there. But if not OOming, this change will 
expedite spills, and prevent later OOMs. (Maybe with a better sizer we could 
check the batch when spilling and adjust then; currently the sizer shows zero 
size on the batch before spilling).
    
    Other changes (beyond DRILL-5616):
    
    (6) The splitAndTransfer() used originally by the HAG to "move" the key 
columns actually allocates the offset vector and copies all the values (for 
varchars, a common key type). This is a memory and time waste !! Probably an 
old code leftover (from before "realloc" was used; see DRILL-1111).
    Changed it to a simple transfer() for the normal cases (offset runs from 
zero to the max expected number). The non-normal cases probably never happen; 
maybe later should be removed (i.e., undoing the DRILL-1111 fix).
    Implementation - pass `numPendingOutput` to outputKeys(), and check against 
this number.
    
    (7) Metrics: Renamed RESIZING_TIME to RESIZING_TIME_MS (Chun asked for 
that) - also for the Hash Join.
    
    (8) Metrics: SPILL_MB was set inconsistently - in two places (one only when 
non-zero, the other did not check, hence sometimes produced zero). Added the 
non-zero check (line 968).
    
    (9) Metrics: Produce the Cycle number even when not spilling (i.e. 0) 
(Rahul asked for that).
    
    (10) For debugging: Added a prefix to the OOM error message (to tell if 
coming from the HashTable or the Aggr batch).
    
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Ben-Zvi/drill DRILL-5616

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/871.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #871
    
----
commit 38bd3fc4d618f5b17935d5e94273511ac367bdbf
Author: Boaz Ben-Zvi <[email protected]>
Date:   2017-07-11T00:53:28Z

    DRILL-5616: Add memory checks, plus minor metrics changes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to