Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/838#discussion_r117536637
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
    @@ -173,9 +174,8 @@ public IterOutcome next() {
     
             currentReader.allocate(mutator.fieldVectorMap());
           } catch (OutOfMemoryException e) {
    -        logger.debug("Caught Out of Memory Exception", e);
             clearFieldVectorMap();
    -        return IterOutcome.OUT_OF_MEMORY;
    +        throw UserException.memoryError(e).build(logger);
    --- End diff --
    
    As it turns out, the idea of the OUT_OF_MEMORY return code works better in 
theory than in practice. No reader correctly handles this case. Let's say we 
have three columns (a, b, c). Let say that column c needs to double its vector, 
but hits OOM. No reader has the internal state needed to hold onto the value 
for c, unwind the call stack, then on the next next() call, rewind back to the 
point of writing c into the in-flight row.
    
    Moving forward, we want to take a broader approach to memory: budget 
sufficient memory that readers can work. Modify the mutators so that they 
enforce batch size limits so that the reader operates within its budget.
    
    As we move to that approach, the OUT_OF_MEMORY status will be retired.
    
    The JIRA mentions another JIRA that holds a spec for all this stuff; 
something we discussed six months ago, but did not have time to implement then.
    
    This all merits a complete discussion; maybe we can discuss the overall 
approach in that other JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to