Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/838#discussion_r117622601
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
    @@ -173,9 +174,8 @@ public IterOutcome next() {
     
             currentReader.allocate(mutator.fieldVectorMap());
           } catch (OutOfMemoryException e) {
    -        logger.debug("Caught Out of Memory Exception", e);
             clearFieldVectorMap();
    -        return IterOutcome.OUT_OF_MEMORY;
    +        throw UserException.memoryError(e).build(logger);
    --- End diff --
    
    You raise two good points.
    
    The first is that Drill tends to use unchecked exceptions because, well, 
they are convenient and Drill must handle such unchecked exceptions as NPE, 
illegal state, illegal argument and so on.
    
    But, normal Java practice is to declare exceptions so that callers know 
what they should handle. See DRILL-5386 asks to rationalize exception use. 
Comments in that ticket from project veterans shows some degree of resistance 
to the idea of checked exceptions. So, yes we must expect any unchecked 
exception from any method. This is why operators should handle all exceptions, 
and why we need code to sort out exceptions based on type.
    
    The analysis of OOM is correct, but omits context. It is seldom (never?) 
that a sort sits directly above a scan. Seems most often there is a filter or 
project between them. If the scan hits OOM, it is not likely because it has 
exhausted the memory limit on the scan operator: each operator defaults to 10 
GB limit. Instead, it is likely that overall system memory is exhausted. So, 
what is likely to happen? Each operator between the scan and sort must handle 
the OUT_OF_MEMORY status by bubbling it up the call stack. Let's assume that 
works.
    
    The sort now wants to spill. Spilling is an activity that requires memory 
to perform. Spilling requires a merge phase to combine the records from 
buffered batches in sort order so that the spilled run is sorted. That is, the 
sort must allocate a batch, often many MB in size. (Spilled runs must be 
sizable so we can limit the number of spilled runs merged in the final merge 
phase.)
    
    So, the sort tries to allocate vectors for the merge batch and... The 
allocation fails. Why? Because we are out of system memory -- that's why the 
scanner triggered an OOM.
    
    I can find no code that sets up this out-of-system-memory condition to 
verify that existing code works. I think we were taking it on faith that this 
behavior actually works.
    
    Moving forward, we are working towards a workable solution. Assign the scan 
some amount of memory, and limit batches to fit within that memory. Give the 
sort a certain amount of memory, and have it manage within that memory so that 
when a spill occurs, the sort has sufficient memory to create the required 
merge batches as part of the spill.
    
    Finally, let's consider the case you outlined: the scan fails with OOM on 
the initial allocation. The initial allocation is often small; the scan goes 
through many vector doublings to read the full complement of rows. (At least, 
thats what I've observed empirically; perhaps the original intent was 
different.) Let's say we tried to recover from an initial allocation failure.
    
    Say we have a row with five columns. We allocate three, but fail on the 
fourth. Say the fourth is a Varchar: has two vectors: offset and data. The 
current logic releases the partially-allocated vectors, which is good. 
OUT_OF_MEMORY is returned and the vectors are reallocated if memory could be 
released. Sounds good.
    
    But, most queries run in multiple threads. If one hits OOM, then the others 
probably will as well. The actual system memory is a shared resource, but there 
is no coordination. A scan might release its partially-allocated vectors so the 
sort can, in theory, spill. But, that memory might immediately be grabbed by 
some other thread, resulting in a sort spill OOM. In practice, however, the 
initial vector allocations are much smaller than the merge batch, so it is only 
slightly useful to free up the initial allocation. That initial allocation, 
plus luck that some other thread has freed enough memory, might allow us to 
spill. But it is a crap-shoot.
    
    In short, even if this logic might possibly work in some scenarios in a 
single-threaded query, it is too chaotic to work in general with many threads. 
And, of course no tests exist for either case so we are just guessing.
    
    All-in-all, the above argues strongly that the path forward is to:
    
    1. Rationalize error handling: OOM errors cause query failure.
    2. Design a memory assignment system so that operators live within a budget.
    3. Design tests to ensure that the system works rather than relying on hope.
    
    This checkin is a step toward goal 1. The external sort revision, hash agg 
spilling and other projects are steps toward goal 2. We continue to chip away 
at our ability to do goal 3.
    
    Given all of this, can you suggest how we could gather evidence that the 
current OUT_OF_MEMORY status is actually working in any actual queries? Or, do 
we have a tough case of comparing concrete changes against an aspiration for 
how the system work might?
    
    More practically, with the change, OOM will fail the query. Previously, 
there is some chance that Drill might recover from an OOM. But, we have no 
tests and no statistics. Is it worth risking that hypothetical for a concrete 
step in the right direction. I don't think we have the answer and that, itself, 
is a problem. Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to