[
https://issues.apache.org/jira/browse/DRILL-5942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242689#comment-16242689
]
Timothy Farkas commented on DRILL-5942:
---------------------------------------
My initial analysis of this issue is as follows:
This is a very complex issue which will take a considerable amount of time to
solve correctly. Rehashing the points that Paul has already mentioned in
various discussions I think there are two main Phases this would need to be
tackled in.
h2. Phase 1: Running Queries Non-Concurrently Without Running Out of Memory
h3. Goal
The goal here would be to run one query at a time successfully in all cases. I
think this is possible to achieve with incremental improvements to the existing
architecture. *Note:* I think achieving *Phase 2* will require significant
changes to Drill's architecture.
h3. Tasks
In order to avoid out of memory exceptions for the single query case, it is
necessary and sufficient to have solutions for all of these sub-tasks.
* Make each operator memory aware. Given a specific memory budget each
operator must be capable of obeying it. All the operators need to be analyze
and made memory aware.
* *Relevant Pending Work:* The HashJoin work Boaz is doing.
* Account for the memory used by Drillbit to Drillbit communication.
Currently exchanges use buffers on both the sending and recieving drill bits.
These buffers can use a significant amount of memory. We would have to be able
to set a limit on the amount of memory used by these buffers and make the
exchange operations smart enough to obey the limit.
* *Relevant Pending Work:* The exchange operator work Vlad is doing.
* Make drill aware of the amount of direct memory allocated to the jvm. This
is necessary because the Drillibit needs to know how much memory it has
available to allocate to operators, and buffers.
* Control batch sizes. We cannot effectively obey memory limits in the
operators and buffers unless we can bound the size of batches.
* *Relevant Pending Work:* The work Paul is doing to limit batch sizes.
* Once everything above is satisfied, we need to test the Parallelizer code
to make sure it doesn't overallocate memory. I believe there are cases where
incorrect configuration can cause the Parallelizers to grossly over allocate
memory.
h2. Phase 2: Running Multiple Concurrently Without Running Out of Memory
h3. Goal
The goal here would be to run multiple concurrent queries successfully in all
cases. However, before we can even think about *Phase 2* we must first solve
the issues outlined in *Phase 1*.
h3. Theory
This will require significant changes to Drill's architecture. This is because
we will have to solve the problem of distributed resource management in order
to effectively allocate resources to concurrent queries without exceeding the
resources we have in our cluster.
h4. Desired State
The good news is that existing cluster managers like YARN already solve this
problem for batch jobs by doing the following:
# The amount of available memory and cpu cores is reported to a resource
manager.
# New jobs are submitted to the resource manager. A job includes a description
of all the containers it needs to run. Each container description also includes
the amount of memory and cpu it will need.
# The resource manager places the job in a Queue.
# The resource manager uses a scheduler to prioritize jobs in the Queue.
# A job with high priority is scheduled to run on the cluster if and only if
the cluster has enough resources to execute the job.
# When a job is deployed to the cluster, the remaining unused resources on the
cluster are updated appropriately.
h4. Current State
Currently Drill does not do any of this. Instead it does the following:
# A query is sent to Drill.
# A foreman is created for the query.
# The query is planned.
# During planning, each query assumes it has access to all of the cluster
resources and is oblivious to any other queries running on the cluster.
Because of this the following issues can occur:
* Queries sporadically run out of memory when too many concurrent queries are
run. For example, assume we have a cluster with 100 Gb of memory. Let's run
*Query A* and assume it consumes 80Gb of memory on the cluster. While *Query A*
is running let's try to run *Query B*. During the planning of *Query B* Drill
is completely unaware that *Query A* is running and assumes that *Query B* has
the full 100 GB at it's disposal. So Drill may launch *Query B* and give it 80
GB. Now there are two queries with 80 GB allocated to each of them for a total
of 160 GB when the cluster only has 100 GB.
* Even if we try to have some smart heuristics to avoid the first issue, a
resource deadlock can occur. For example *Query A* could be partially deployed
to the cluster and take up half of the cluster resources, similarly *Query B*
could be partially deployed to the cluster and take up the other half of the
cluster resources. In this case both *Query A* and *Query B* will wait to get
the remaining resources they need, but they will never get them unless one of
them is cancelled.
In summary we need a true distributed resource management system like YARN
described above to effectively resolve these issues. *Note:* I am not saying we
should use YARN for this purpose, I am just saying that we will have to solve
the same problem and will have to use a similar architecture.
h3. Short Term Actionable Items
We can start working on tasks outlined in *Phase 1* in order to incrementally
step towards a solution to this problem. But we will continue to see this issue
pop up until all of the issues / tasks for *Phase 1* and *Phase 2* outlined
above are address.
> Drill Resource Management
> -------------------------
>
> Key: DRILL-5942
> URL: https://issues.apache.org/jira/browse/DRILL-5942
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Timothy Farkas
>
> OutOfMemoryExceptions still occur in Drill. This ticket addresses a plan for
> what it would take to ensure all queries are able to execute on Drill without
> running out of memory.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)