[ 
https://issues.apache.org/jira/browse/DRILL-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027215#comment-16027215
 ] 

ASF GitHub Bot commented on DRILL-5457:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/822#discussion_r118811633
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ---
    @@ -136,15 +136,21 @@ public IterOutcome innerNext() {
           return IterOutcome.NONE;
         }
     
    -    if (aggregator.buildComplete() && !aggregator.allFlushed()) {
    -      // aggregation is complete and not all records have been output yet
    -      return aggregator.outputCurrentBatch();
    +    // if aggregation is complete and not all records have been output yet
    +    if (aggregator.buildComplete() ||
    +        // or: 1st phase need to return (not fully grouped) partial output 
due to memory pressure
    +        aggregator.earlyOutput()) {
    +      // then output the next batch downstream
    +      IterOutcome out = aggregator.outputCurrentBatch();
    --- End diff --
    
    Since `HashAggregator` is not an operator executor (AKA record batch), it 
does not have to follow the iterator protocol and use the `IterOutcome` enum. 
Instead, you can define your own. You won't need the `OK_NEW_SCHEMA`, 
`OUT_OF_MEMORY`, `FAIL` or `NOT_YET` values. All you seem to need is `OK`, 
`NONE` and `RESTART`.
    
    This approach will avoid the need to change the `IterOutcome` enum and 
export your states to all of the Drill iterator protocol.
    
    Did something similar in Sort for the iterator class that returns either 
in-memory or merged spilled batches.


> Support Spill to Disk for the Hash Aggregate Operator
> -----------------------------------------------------
>
>                 Key: DRILL-5457
>                 URL: https://issues.apache.org/jira/browse/DRILL-5457
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.10.0
>            Reporter: Boaz Ben-Zvi
>            Assignee: Boaz Ben-Zvi
>             Fix For: 1.11.0
>
>
> Support gradual spilling memory to disk as the available memory gets too 
> small to allow in memory work for the Hash Aggregate Operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to