This is an automated email from the ASF dual-hosted git repository.
bridgetb pushed a commit to branch gh-pages
in repository https://gitbox.apache.org/repos/asf/drill.git
The following commit(s) were added to refs/heads/gh-pages by this push:
new 896d4dd edits
896d4dd is described below
commit 896d4ddfd7b7b25fcee8bc75a2a0f8e185c1ca0f
Author: Bridget Bevens <[email protected]>
AuthorDate: Wed Sep 5 18:45:57 2018 -0700
edits
---
...-and-hash-based-memory-constrained-operators.md | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)
diff --git
a/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
b/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
index c7e658b..8047638 100644
---
a/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
+++
b/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
@@ -1,6 +1,6 @@
---
title: "Sort-Based and Hash-Based Memory-Constrained Operators"
-date: 2018-09-06 01:31:07 UTC
+date: 2018-09-06 01:45:58 UTC
parent: "Query Plans and Tuning"
---
@@ -11,12 +11,9 @@ Drill supports the following memory-intensive operators,
which can temporarily s
- Hash-Aggregate
Drill only uses the External Sort operator to sort data. Drill uses the
Hash-Aggregate operator to aggregate data. Alternatively, Drill can sort the
data and then use the (lightweight) Streaming-Aggregate operator to aggregate
data.
-Drill uses the Hash-Join operator to join data. Alternatively, Drill can use
the Nested-Loop-Join or sort the data and then use the (lightweight)
Merge-Join. Drill typically uses Hash operators for joining and aggregation, as
they perform better than the Sort operator (Hash - O(N) vs. Sort - O(N *
log(N))). However, if the Hash operators are disabled, or the data is already
sorted, Drill uses the alternative methods previously described.
-
-The memory configuration in Drill is specified as the memory limit per-query,
per-node. The allocated memory is equally divided among all instances of the
spillable operators (per query on each node). The number of instances is the
number of spillable operators in the query plan multiplied by the maximal
degree of parallelism. The maximal degree of parallelism is the number of minor
fragments required to perform the work for each instance of a spillable
operator. When an instance of a sp [...]
-
-To see the difference in memory consumption between the operators, run a query
and then view the query profile in the Drill Web UI. Optionally, you can
disable the Hash operators, which forces Drill to use the Merge-Join and
Streaming-Aggregate operators.
+Drill uses the Hash-Join operator to join data. Alternatively, Drill can use
the Nested-Loop-Join or sort the data and then use the (lightweight)
Merge-Join. Drill typically uses Hash operators for joining and aggregation, as
they perform better than the Sort operator (Hash - O(N) vs. Sort - O(N *
log(N))). However, if you disable the Hash operators, or the data is already
sorted, Drill uses the alternative methods previously described.
+The memory configuration in Drill is specified as the memory limit per-query,
per-node. The allocated memory is equally divided among all instances of the
spillable operators (per query on each node). The number of instances is the
number of spillable operators in the query plan multiplied by the maximal
degree of parallelism. The maximal degree of parallelism is the number of minor
fragments required to perform the work for each instance of a spillable
operator. When an instance of a sp [...]
##Spill to Disk
@@ -32,7 +29,7 @@ Ideally, you want to allocate enough memory for Drill to
perform all operations
Spillable operators write data to a temporary work area on disk when they
cannot process all of the data in memory. The default location of the temporary
work area is `/tmp/drill/spill` on the local file system.
-The `/tmp/drill/spill` directory should suffice for small workloads or
examples, however it is highly recommended that you redirect the default spill
location to a location with enough disk space to support spilling for large
workloads.
+The `/tmp/drill/spill` directory should suffice for small workloads or
examples; however, you should redirect the default spill location to a location
with enough disk space to support spilling for large workloads.
**Note:** Spilled data may require more space than the table referenced in the
query that is spilling the data. For example, when the underlying table is
compressed (Parquet), or when the operator received data joined from multiple
tables.
@@ -40,14 +37,15 @@ When you configure the spill location, you can specify a
single directory or a l
**Configuring Spill to Disk**
-The `drill-override.conf` file, located in the `/conf` directory, contains
options that set the spill locations for the Hash and Sort operators. An
administrator can change the file system and directories into which the
operators spill data. Refer to the `drill-override-example.conf` file included
in the `/conf` directory for examples.
+The `drill-override.conf` file, located in the `/conf` directory, contains
options that set the spill locations for the spillable operators. An
administrator can change the file system and directories into which the
operators spill data. Refer to the `drill-override-example.conf` file included
in the `/conf` directory for examples.
The following list describes the spill to disk configuration options:
- **drill.exec.spill.fs**
-Introduced in Drill 1.11. The default file system on the local machine into
which the Sort, Hash Aggregate, and Hash Join operators spill data. You can
configure this option so that data spills into a distributed file system, such
as hdfs. For example, "hdfs:///". The default setting is "file:///".
+Introduced in Drill 1.11. The default file system on the local machine into
which the spillable operators spill data. You can configure this option so that
data spills into a distributed file system, such as hdfs. For example,
"hdfs:///". The default setting is "file:///".
+
- **drill.exec.spill.directories**
-Introduced in Drill 1.11. The list of directories into which the Sort, Hash
Aggregate, and Hash Join operators spill data. The list must be an array with
directories separated by a comma, for example ["/fs1/drill/spill" ,
"/fs2/drill/spill" , "/fs3/drill/spill"]. The default setting is
["/tmp/drill/spill"].
+Introduced in Drill 1.11. The list of directories into which the spillable
operators spill data. The list must be an array with directories separated by a
comma, for example ["/fs1/drill/spill" , "/fs2/drill/spill" ,
"/fs3/drill/spill"]. The default setting is ["/tmp/drill/spill"].
**Note:** The following options were available prior to Drill 1.11, but have
since been deprecated and replaced with the options described above:
@@ -58,7 +56,7 @@ Introduced in Drill 1.11. The list of directories into which
the Sort, Hash Aggr
##Memory Allocation
-Drill evenly splits the available memory among all instances of the spillable
operators. When a query is parallelized, the number of operators is multiplied,
which reduces the amount of memory given to each instance of the operators
during a query.
+Drill evenly splits the available memory among all instances of the spillable
operators. When a query is parallelized, the number of operators is multiplied,
which reduces the amount of memory given to each instance of the operators
during a query. To see the difference in memory consumption between the
operators, you can run a query and then view the query profile in the Drill Web
UI. Optionally, you can disable the Hash operators, which forces Drill to use
the Merge-Join and Streaming- [...]
**Memory Allocation Configuration Options**
@@ -68,7 +66,7 @@ The `planner.memory.max_query_memory_per_node` and
`planner.memory.percent_per_q
The `planner.memory.max_query_memory_per_node` option is the minimum amount of
memory available to Drill per query on a node. The default of 2 GB typically
allows between two and three concurrent queries to run when the JVM is
configured to use 8 GB of direct memory (default). When the memory requirement
for Drill increases, the default of 2 GB is constraining. You must increase the
amount of memory for queries to complete, unless the setting for the
`planner.memory.percent_per_query` op [...]
- **planner.memory.percent\_per_query**
-Alternatively, the `planner.memory.percent_per_query` option sets the memory
as a percentage of the total direct memory. The default is 5%. This value is
only used when throttling is disabled. Setting the value to 0 disables the
option. You can increase or decrease the value, however you should set the
percentage well below the JVM direct memory to account for the cases where
Drill does not manage memory, such as for the less memory intensive operators.
+Alternatively, the `planner.memory.percent_per_query` option sets the memory
as a percentage of the total direct memory. The default is 5%. This value is
only used when throttling is disabled. Setting the value to 0 disables the
option. You can increase or decrease the value; however, you should set the
percentage well below the JVM direct memory to account for the cases where
Drill does not manage memory, such as for the less memory intensive operators.
- The percentage is calculated using the following formula: