Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/838#discussion_r117536637 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java --- @@ -173,9 +174,8 @@ public IterOutcome next() { currentReader.allocate(mutator.fieldVectorMap()); } catch (OutOfMemoryException e) { - logger.debug("Caught Out of Memory Exception", e); clearFieldVectorMap(); - return IterOutcome.OUT_OF_MEMORY; + throw UserException.memoryError(e).build(logger); --- End diff -- As it turns out, the idea of the OUT_OF_MEMORY return code works better in theory than in practice. No reader correctly handles this case. Let's say we have three columns (a, b, c). Let say that column c needs to double its vector, but hits OOM. No reader has the internal state needed to hold onto the value for c, unwind the call stack, then on the next next() call, rewind back to the point of writing c into the in-flight row. Moving forward, we want to take a broader approach to memory: budget sufficient memory that readers can work. Modify the mutators so that they enforce batch size limits so that the reader operates within its budget. As we move to that approach, the OUT_OF_MEMORY status will be retired. The JIRA mentions another JIRA that holds a spec for all this stuff; something we discussed six months ago, but did not have time to implement then. This all merits a complete discussion; maybe we can discuss the overall approach in that other JIRA.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---