[ 
https://issues.apache.org/jira/browse/DRILL-5942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242689#comment-16242689
 ] 

Timothy Farkas commented on DRILL-5942:
---------------------------------------

My initial analysis of this issue is as follows:

This is a very complex issue which will take a considerable amount of time to 
solve correctly. Rehashing the points that Paul has already mentioned in 
various discussions I think there are two main Phases this would need to be 
tackled in.

h2. Phase 1: Running Queries Non-Concurrently Without Running Out of Memory

h3. Goal

The goal here would be to run one query at a time successfully in all cases. I 
think this is possible to achieve with incremental improvements to the existing 
architecture. *Note:* I think achieving *Phase 2* will require significant 
changes to Drill's architecture.

h3. Tasks

In order to avoid out of memory exceptions for the single query case, it is 
necessary and sufficient to have solutions for all of these sub-tasks.

  * Make each operator memory aware. Given a specific memory budget each 
operator must be capable of obeying it. All the operators need to be analyze 
and made memory aware.
    * *Relevant Pending Work:* The HashJoin work Boaz is doing.
  * Account for the memory used by Drillbit to Drillbit communication. 
Currently exchanges use buffers on both the sending and recieving drill bits. 
These buffers can use a significant amount of memory. We would have to be able 
to set a limit on the amount of memory used by these buffers and make the 
exchange operations smart enough to obey the limit.
    * *Relevant Pending Work:* The exchange operator work Vlad is doing.
  * Make drill aware of the amount of direct memory allocated to the jvm. This 
is necessary because the Drillibit needs to know how much memory it has 
available to allocate to operators, and buffers.
  * Control batch sizes. We cannot effectively obey memory limits in the 
operators and buffers unless we can bound the size of batches. 
    * *Relevant Pending Work:* The work Paul is doing to limit batch sizes.
  * Once everything above is satisfied, we need to test the Parallelizer code 
to make sure it doesn't overallocate memory. I believe there are cases where 
incorrect configuration can cause the Parallelizers to grossly over allocate 
memory.

h2. Phase 2: Running Multiple Concurrently Without Running Out of Memory

h3. Goal

The goal here would be to run multiple concurrent queries successfully in all 
cases. However, before we can even think about *Phase 2* we must first solve 
the issues outlined in *Phase 1*.

h3. Theory

This will require significant changes to Drill's architecture. This is because 
we will have to solve the problem of distributed resource management in order 
to effectively allocate resources to concurrent queries without exceeding the 
resources we have in our cluster. 

h4. Desired State

The good news is that existing cluster managers like YARN already solve this 
problem for batch jobs by doing the following:

# The amount of available memory and cpu cores is reported to a resource 
manager.
# New jobs are submitted to the resource manager. A job includes a description 
of all the containers it needs to run. Each container description also includes 
the amount of memory and cpu it will need.
# The resource manager places the job in a Queue.
# The resource manager uses a scheduler to prioritize jobs in the Queue.
# A job with high priority is scheduled to run on the cluster if and only if 
the cluster has enough resources to execute the job.
# When a job is deployed to the cluster, the remaining unused resources on the 
cluster are updated appropriately.

h4. Current State

Currently Drill does not do any of this. Instead it does the following:

# A query is sent to Drill.
# A foreman is created for the query.
# The query is planned.
# During planning, each query assumes it has access to all of the cluster 
resources and is oblivious to any other queries running on the cluster.

Because of this the following issues can occur:

* Queries sporadically run out of memory when too many concurrent queries are 
run. For example, assume we have a cluster with 100 Gb of memory. Let's run 
*Query A* and assume it consumes 80Gb of memory on the cluster. While *Query A* 
is running let's try to run *Query B*. During the planning of *Query B* Drill 
is completely unaware that *Query A* is running and assumes that *Query B* has 
the full 100 GB at it's disposal. So Drill may launch *Query B* and give it 80 
GB. Now there are two queries with 80 GB allocated to each of them for a total 
of 160 GB when the cluster only has 100 GB.
* Even if we try to have some smart heuristics to avoid the first issue, a 
resource deadlock can occur. For example *Query A* could be partially deployed 
to the cluster and take up half of the cluster resources, similarly *Query B* 
could be partially deployed to the cluster and take up the other half of the 
cluster resources. In this case both *Query A* and *Query B* will wait to get 
the remaining resources they need, but they will never get them unless one of 
them is cancelled.

In summary we need a true distributed resource management system like YARN 
described above to effectively resolve these issues. *Note:* I am not saying we 
should use YARN for this purpose, I am just saying that we will have to solve 
the same problem and will have to use a similar architecture.

h3. Short Term Actionable Items

We can start working on tasks outlined in *Phase 1* in order to incrementally 
step towards a solution to this problem. But we will continue to see this issue 
pop up until all of the issues / tasks for *Phase 1* and *Phase 2* outlined 
above are address. 



> Drill Resource Management
> -------------------------
>
>                 Key: DRILL-5942
>                 URL: https://issues.apache.org/jira/browse/DRILL-5942
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Timothy Farkas
>
> OutOfMemoryExceptions still occur in Drill. This ticket addresses a plan for 
> what it would take to ensure all queries are able to execute on Drill without 
> running out of memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to