[GitHub] drill pull request #822: DRILL-5457: Spill implementation for Hash Aggregate

paul-rogers Fri, 26 May 2017 22:38:30 -0700

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/822#discussion_r118811633
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ---
    @@ -136,15 +136,21 @@ public IterOutcome innerNext() {
           return IterOutcome.NONE;
         }
     
    -    if (aggregator.buildComplete() && !aggregator.allFlushed()) {
    -      // aggregation is complete and not all records have been output yet
    -      return aggregator.outputCurrentBatch();
    +    // if aggregation is complete and not all records have been output yet
    +    if (aggregator.buildComplete() ||
    +        // or: 1st phase need to return (not fully grouped) partial output 
due to memory pressure
    +        aggregator.earlyOutput()) {
    +      // then output the next batch downstream
    +      IterOutcome out = aggregator.outputCurrentBatch();
    --- End diff --
    
    Since `HashAggregator` is not an operator executor (AKA record batch), it 
does not have to follow the iterator protocol and use the `IterOutcome` enum. 
Instead, you can define your own. You won't need the `OK_NEW_SCHEMA`, 
`OUT_OF_MEMORY`, `FAIL` or `NOT_YET` values. All you seem to need is `OK`, 
`NONE` and `RESTART`.
    
    This approach will avoid the need to change the `IterOutcome` enum and 
export your states to all of the Drill iterator protocol.
    
    Did something similar in Sort for the iterator class that returns either 
in-memory or merged spilled batches.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #822: DRILL-5457: Spill implementation for Hash Aggregate

Reply via email to