Repository: drill
Updated Branches:
  refs/heads/gh-pages c533e56bf -> ccd89314c


doc updates for Drill 1.13


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/ccd89314
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/ccd89314
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/ccd89314

Branch: refs/heads/gh-pages
Commit: ccd89314cc0d70894f44f4687f6a4e1ede1b2aec
Parents: c533e56
Author: Bridget Bevens <bbev...@maprtech.com>
Authored: Tue Mar 13 17:58:04 2018 -0700
Committer: Bridget Bevens <bbev...@maprtech.com>
Committed: Tue Mar 13 17:58:04 2018 -0700

----------------------------------------------------------------------
 .../020-configuring-drill-memory.md             |  53 ++++----
 .../010-configuration-options-introduction.md   |  12 +-
 .../020-start-up-options.md                     |   6 +-
 ...d-hash-based-memory-constrained-operators.md | 134 +++++++++++--------
 team.md                                         |   1 +
 5 files changed, 118 insertions(+), 88 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/ccd89314/_docs/configure-drill/020-configuring-drill-memory.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/020-configuring-drill-memory.md 
b/_docs/configure-drill/020-configuring-drill-memory.md
index e564e5b..f2c4122 100644
--- a/_docs/configure-drill/020-configuring-drill-memory.md
+++ b/_docs/configure-drill/020-configuring-drill-memory.md
@@ -1,36 +1,24 @@
 ---
 title: "Configuring Drill Memory"
-date: 2018-01-30 05:41:06 UTC
+date: 2018-03-14 00:58:05 UTC
 parent: "Configure Drill"
 ---
 
-You can configure the amount of direct memory allocated to a Drillbit for 
query processing in any Drill cluster, multitenant or not. The default memory 
for a drillbit is 8G, but Drill prefers 16G or more depending on the workload. 
The total amount of direct memory that a drillbit allocates to query operations 
cannot exceed the limit set.
+Drill uses Java direct memory. You can configure the amount of direct memory 
allocated to a Drillbit for query processing. The default memory for a Drillbit 
is 8G, but Drill prefers 16G or more depending on the workload. The total 
amount of direct memory that a Drillbit allocates to query operations cannot 
exceed the limit set.
 
-Drill uses Java direct memory and performs well when executing operations in 
memory instead of storing the operations on disk. Drill does not write to disk 
unless absolutely necessary, unlike MapReduce where everything is written to 
disk during each phase of a job.
+Drill performs well when executing operations in memory instead of storing the 
operations on disk. Drill does not write to disk unless absolutely necessary, 
unlike MapReduce where everything is written to disk during each phase of a job.
 
-The JVM’s heap memory does not limit the amount of direct memory available in
-a drillbit. The on-heap memory for Drill is typically set at 4-8G (default is 
4), which should
-suffice because Drill avoids having data sit in heap memory.
+The JVM heap memory does not limit the amount of direct memory available in a 
Drillbit. The on-heap memory for Drill is typically set at 4-8G (default is 4), 
which should
+suffice because Drill avoids having data sit in heap memory.  
 
-As of Drill 1.5, Drill uses a new allocator that improves an operator’s use 
of direct memory and tracks the memory use more accurately. Due to this change, 
the sort operator (in queries that ran successfully in previous releases) may 
not have enough memory, resulting in a failed query and out of memory error 
instead of spilling to disk.     
+The following sections describe how to modify the memory allocated to each 
Drillbit and queries:  
 
+## Modifying Memory Allocated to a Drillbit  
 
-## Drillbit Memory  
-The value set for the 
[`planner.memory.max_query_memory_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/#system-options)
 system option sets the maximum amount of direct memory allocated to the Sort 
and Hash Aggreate operators in each query on a node. If a query plan contains 
multiple Sort and/or Hash Aggregate operators, they all share this memory. The 
default limit is set to 2147483648 bytes (2GB), which should be increased for 
queries on large data sets. If you encounter memory issues when running queries 
with Sort and/or Hash Aggregate operators, increase the value of this option. 
See [Sort-Based and Hash-Based Memory Constrained 
Operators](https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/)
 for more information.  
+Modify the memory allocated to each Drillbit in a cluster in the Drillbit 
startup script, `<drill_installation_directory>/conf/drill-env.sh`. You must 
[restart Drill]({{ site.baseurl }}/docs/starting-drill-in-distributed-mode) 
after you modify the script.
 
-If you continue to encounter memory issues after increasing this value, you 
can also reduce the value of the 
[`planner.width.max_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/)
 option to reduce the level of parallelism per node. However, this may increase 
the amount of time required for a query to complete. 
+{% include startnote.html %}If DRILL_MAX_DIRECT_MEMORY is not set, the limit 
depends on the amount of available direct memory.{% include endnote.html %}
 
-###Modifying Drillbit Memory
-
-You can modify memory for each drillbit node in your cluster. To modify the 
memory for a drillbit, set the DRILL_MAX_DIRECT_MEMORY variable in the drillbit 
startup script, `drill-env.sh`, located in 
`<drill_installation_directory>/conf`, as follows:
-
-    export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"<value>"}
-
-{% include startnote.html %}If DRILL_MAX_DIRECT_MEMORY is not set, the limit 
depends on the amount of available system memory.{% include endnote.html %}
-
-After you edit `<drill_installation_directory>/conf/drill-env.sh`, [restart 
the drillbit]({{ site.baseurl }}/docs/starting-drill-in-distributed-mode) on 
the node.
-
-### About the Drillbit Startup Script
 
 The `drill-env.sh` file contains the following options:
 
@@ -57,8 +45,25 @@ As of Drill 1.13, bounds checking for direct memory is 
disabled by default. To e
 For earlier versions of Drill (prior to 1.13), bounds checking is enabled by 
default. To disable bounds checking, set the 
`drill.enable_unsafe_memory_access` parameter to true, as shown:  
 
 
-    export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS 
-Ddrill.enable_unsafe_memory_access=true"
-  
-  
+    export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS 
-Ddrill.enable_unsafe_memory_access=true"  
+
+
+##Modifying Memory Allocated to Queries  
+
+You can configure the amount of memory that Drill allocates to each query as a 
hard limit or a percentage of the total direct memory. The 
`planner.memory.max_query_memory_per_node` and 
`planner.memory.percent_per_query` options set the amount of memory that Drill 
can allocate to a query on a node. Both options are enabled by default. Of 
these two options, Drill picks the setting that provides the most memory. For 
more information about these options, see [Sort-Based and Hash-Based Memory 
Constrained 
Operators](https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/).
  
+
+
+If you modify the memory allocated per query and continue to experience 
out-of-memory errors, you can try reducing the value of the 
[`planner.width.max_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/)
 option. Reducing the value of this option reduces the level of parallelism per 
node. However, this may increase the amount of time required for a query to 
complete.  
+
+Another option you can modify is the 
`drill.exec.memory.operator.output_batch_size` option, introduced in Drill 
1.13. The  `drill.exec.memory.operator.output_batch_size` option limits the 
amount of memory that the Flatten, Merge Join, and External Sort operators 
allocate to outgoing batches. Limiting the memory allocated to outgoing batches 
can improve concurrency and prevent queries from failing with out-of-memory 
errors.
+ 
+The average row size of the outgoing batch (calculated from the incoming batch 
size) determines the number of rows that can fit into the available memory for 
the batch. If your queries fail with memory errors, reduce the value of the 
`drill.exec.memory.operator.output_batch_size` option to reduce the output 
batch size. 
+
+The default value is 16777216 (16 MB). The maximum allowed value is 536870912 
(512 MB). Enter the value in bytes. 
+
+**Note:** Configuring a batch size less than 1 MB is not recommended, as it 
could lead to performance issues. 
 
+Use the ALTER SYSTEM SET command to change the settings, as shown:  
 
+       ALTER SYSTEM SET `drill.exec.memory.operator.output_batch_size` = 
<value>;
+  
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/ccd89314/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
----------------------------------------------------------------------
diff --git 
a/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
 
b/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
index 9f28eeb..08352e0 100644
--- 
a/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
+++ 
b/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
@@ -1,6 +1,6 @@
 ---
 title: "Configuration Options Introduction"
-date: 2018-02-05 23:56:13 UTC
+date: 2018-03-14 00:58:05 UTC
 parent: "Configuration Options"
 ---
 
@@ -13,22 +13,25 @@ See [Configuration and Launch Script 
Changes]({{site.baseurl}}/docs/apache-drill
 The sys.options table contains information about system and session options. 
The sys.boot table contains information about Drill start-up options. The 
section, ["Start-up Options"]({{site.baseurl}}/docs/start-up-options), covers 
how to configure and view key boot options. The following table lists the 
system options in alphabetical order and provides a brief description of 
supported options.
 
 ## System Options
-The sys.options table lists ptions that you can set at the system or session 
level, as described in the section, ["Planning and Execution 
Options"]({{site.baseurl}}/docs/planning-and-execution-options).  
+The sys.options table lists options that you can set at the system or session 
level, as described in the section, ["Planning and Execution 
Options"]({{site.baseurl}}/docs/planning-and-execution-options).  
 
-| **Name**                                              | **Default**          
                                 | **Description**                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                       |
+| Name                                              | Default                  
                         | Description                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                           |
 
|---------------------------------------------------|---------------------------------------------------|
 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | drill.exec.default_temporary_workspace            | dfs.tmp                  
                         | Available as of Drill 1.10. Sets the   workspace for 
temporary tables. The workspace must be writable, file-based,   and point to a 
location that already exists. This option requires the   following format: 
.<workspace                                                                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                            
                                                                                
                                                                                
                                                           |
+| drill.exec.memory.operator.output_batch_size      | 16777216   (16 MB)       
                         |       Available as of Drill 1.13. Limits the   
amount of memory that the Flatten, Merge Join, and External Sort operators   
allocate to outgoing batches.                                                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                               
                                                                                
                                                                                
                                                           |
 | drill.exec.storage.implicit.filename.column.label | filename                 
                         | Available as of Drill 1.10. Sets the   implicit 
column name for the filename column.                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                           
                                                                                
                                                                                
                                                           |
 | drill.exec.storage.implicit.filepath.column.label | filepath                 
                         | Available as of Drill 1.10. Sets the   implicit 
column name for the filepath column.                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                           
                                                                                
                                                                                
                                                           |
 | drill.exec.storage.implicit.fqn.column.label      | fqn                      
                         | Available as of Drill 1.10. Sets the   implicit 
column name for the fqn column.                                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                           
                                                                                
                                                                                
                                                           |
 | drill.exec.storage.implicit.suffix.column.label   | suffix                   
                         | Available as of Drill 1.10. Sets the   implicit 
column name for the suffix column.                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                           
                                                                                
                                                                                
                                                           |
 | drill.exec.functions.cast_empty_string_to_null    | FALSE                    
                         | In a text file, treat empty fields as NULL   values 
instead of empty string.                                                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                       
                                                                                
                                                                                
                                                           |
+| drill.exe.spill.fs                                |  "file:///"              
                         | Introduced   in Drill 1.11. The default file system 
on the local machine into which the   Sort, Hash Aggregate, and Hash Join 
operators spill data.                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                             
                                                                                
                                                                                
                                                           |
+| drill.exec.spill.directories                      | ["/tmp/drill/spill"]     
                         | Introduced   in Drill 1.11. The list of directories 
into which the Sort, Hash Aggregate,   and Hash Join operators spill data. The 
list must be an array with   directories separated by a comma, for example 
["/fs1/drill/spill" ,   "/fs2/drill/spill" , "/fs3/drill/spill"].               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                             
                                                                                
                                                                                
                                                           |
 | drill.exec.storage.file.partition.column.label    | dir                      
                         | The column label for directory levels in   results 
of queries of files in a directory. Accepts a string input.                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                        
                                                                                
                                                                                
                                                           |
 | exec.enable_union_type                            | FALSE                    
                         | Enable support for Avro union type.                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                           |
 | exec.errors.verbose                               | FALSE                    
                         | Toggles verbose output of executable error   
messages                                                                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                              
                                                                                
                                                                                
                                                           |
 | exec.java_compiler                                | DEFAULT                  
                         | Switches between DEFAULT, JDK, and JANINO   mode for 
the current session. Uses Janino by default for generated source   code of less 
than exec.java_compiler_janino_maxsize; otherwise, switches to   the JDK 
compiler.                                                                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                             
                                                                                
                                                                                
                                                           |
 | exec.java_compiler_debug                          | TRUE                     
                         | Toggles the output of debug-level compiler   error 
messages in runtime generated code.                                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                        
                                                                                
                                                                                
                                                           |
-| exec.java.compiler.exp_in_method_size             | 50                       
                         | Introduced in Drill 1.8. For queries with complex or 
multiple expressions in the query logic, this option   limits the number of 
expressions allowed in each method to prevent Drill from   generating code that 
exceeds the Java limit of 64K bytes. If a method   approaches the 64K limit, 
the Java compiler returns a message stating that   the code is too large to 
compile. If queries return such a message, reduce   the value of this option at 
the session level. The default value for this option is 50. The value is the 
count of   expressions allowed in a method. Expressions are added to a method 
until they   hit the Java 64K limit, when a new inner method is created and 
called from   the existing method.          **Note:** This logic has not   been 
implemented for all operators. If a query uses operators for which the   logic 
is not implemented, reducing the setting for this option ma
 y not   resolve the error. Setting this option at the system level impacts all 
  queries and can degrade query performance.                                    
    |
+| exec.java.compiler.exp_in_method_size             | 50                       
                         | Introduced in Drill 1.8. For queries with   complex 
or multiple expressions in the query logic, this option limits the   number of 
expressions allowed in each method to prevent Drill from generating   code that 
exceeds the Java limit of 64K bytes. If a method approaches the 64K   limit, 
the Java compiler returns a message stating that the code is too large   to 
compile. If queries return such a message, reduce the value of this option   at 
the session level. The default value for this option is 50. The value is   the 
count of expressions allowed in a method. Expressions are added to a   method 
until they hit the Java 64K limit, when a new inner method is created   and 
called from the existing method. Note: This logic has not been implemented for 
all operators. If   a query uses operators for which the logic is not 
implemented, reducing the   setting for this option may not resol
 ve the error. Setting this option at the   system level impacts all queries 
and can degrade query performance.                                              
                                                              |
 | exec.java_compiler_janino_maxsize                 | 262144                   
                         | See the exec.java_compiler option comment.   Accepts 
inputs of type LONG.                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                           |
 | exec.max_hash_table_size                          | 1073741824               
                         | Ending size in buckets for hash tables.   Range: 0 - 
1073741824.                                                                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                           |
 | exec.min_hash_table_size                          | 65536                    
                         | Starting size in bucketsfor hash tables.   Increase 
according to available memory to improve performance. Increasing for   very 
large aggregations or joins when you have large amounts of memory for   Drill 
to use. Range: 0 - 1073741824.                                                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                             
                                                                                
                                                                                
                                                           |
@@ -72,6 +75,7 @@ The sys.options table lists ptions that you can set at the 
system or session lev
 | planner.memory.max_query_memory_per_node          | 2147483648 bytes         
                         | Sets the maximum amount of direct memory   allocated 
to the Sort and Hash Aggregate operators during each query on a   node. This 
memory is split between operators. If a query plan contains   multiple Sort 
and/or Hash Aggregate operators, the memory is divided between   them. The 
default limit should be increased for queries on large data sets.               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                  
                                                                                
                                                                                
                                                           |
 | planner.memory.non_blocking_operators_memory      | 64                       
                         | Extra query memory per node for non-blocking   
operators. This option is currently used only for memory estimation. Range:   
0-2048 MB                                                                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                              
                                                                                
                                                                                
                                                           |
 | planner.memory_limit                              | 268435456 bytes          
                         | Defines the maximum amount of direct memory   
allocated to a query for planning. When multiple queries run concurrently,   
each query is allocated the amount of memory set by this parameter.Increase   
the value of this parameter and rerun the query if partition pruning failed   
due to insufficient memory.                                                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                    
                                                                                
                                                                                
                                                           |
+| planner.memory.percent_per_query                  | 0.05                     
                         | Sets   the memory as a percentage of the total 
direct memory.                                                                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                            
                                                                                
                                                                                
                                                           |
 | planner.nestedloopjoin_factor                     | 100                      
                         | A heuristic value for influencing the nested   loop 
join.                                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                       
                                                                                
                                                                                
                                                           |
 | planner.partitioner_sender_max_threads            | 8                        
                         | Upper limit of threads for outbound queuing.         
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                           |
 | planner.partitioner_sender_set_threads            | -1                       
                         | Overwrites the number of threads used to   send out 
batches of records. Set to -1 to disable. Typically not changed.                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                       
                                                                                
                                                                                
                                                           |

http://git-wip-us.apache.org/repos/asf/drill/blob/ccd89314/_docs/configure-drill/configuration-options/020-start-up-options.md
----------------------------------------------------------------------
diff --git 
a/_docs/configure-drill/configuration-options/020-start-up-options.md 
b/_docs/configure-drill/configuration-options/020-start-up-options.md
index 36400d8..9ab2e4b 100644
--- a/_docs/configure-drill/configuration-options/020-start-up-options.md
+++ b/_docs/configure-drill/configuration-options/020-start-up-options.md
@@ -1,6 +1,6 @@
 ---
 title: "Start-Up Options"
-date: 2017-08-17 21:20:19 UTC
+date: 2018-03-14 00:58:06 UTC
 parent: "Configuration Options"
 ---
 The start-up options for Drill reside in a 
[HOCON](https://github.com/typesafehub/config/blob/master/HOCON.md) 
configuration file format, which is a hybrid between a properties file and a 
JSON file. Drill start-up options consist of a group of files with a nested 
relationship. At the bottom of the file hierarchy are the default files that 
Drill provides, starting with `drill-default.conf`. 
@@ -56,10 +56,10 @@ The summary of start-up options, also known as boot 
options, lists default value
   Defines the amount of memory available, in terms of record batches, to hold 
data on the downstream side of an operation. Drill pushes data downstream as 
quickly as possible to make data immediately available. This requires Drill to 
use memory to hold the data pending operations. When data on a downstream 
operation is required, that data is immediately available so Drill does not 
have to go over the network to process it. Providing more memory to this option 
increases the speed at which Drill completes a query.  
   
 * **drill.exe.spill.fs**  
-Introduced in Drill 1.11. The default file system on the local machine into 
which the Sort and Hash Aggregate operators spill data. This is the recommended 
option to use for spilling. You can configure this option so that data spills 
into a distributed file system, such as hdfs. For example, "hdfs:///". The 
default setting is "file:///". See [Sort-Based and Hash-Based Memory 
Constrained 
Operators]({{site.baseurl}}/docs/sort-based-and-hash-based-memory-constrained-operators/)
 for more information.   
+Introduced in Drill 1.11. The default file system on the local machine into 
which the Sort, Hash Aggregate, and Hash Join operators spill data. This is the 
recommended option to use for spilling. You can configure this option so that 
data spills into a distributed file system, such as hdfs. For example, 
"hdfs:///". The default setting is "file:///". See [Sort-Based and Hash-Based 
Memory Constrained 
Operators]({{site.baseurl}}/docs/sort-based-and-hash-based-memory-constrained-operators/)
 for more information.   
   
 * **drill.exec.spill.directories**  
-Introduced in Drill 1.11. The list of directories into which the Sort and Hash 
Aggregate operators spill data. The list must be an array with directories 
separated by a comma, for example ["/fs1/drill/spill" , "/fs2/drill/spill" , 
"/fs3/drill/spill"]. This is the recommended option for spilling to multiple 
directories. The default setting is ["/tmp/drill/spill"]. See [Sort-Based and 
Hash-Based Memory Constrained 
Operators]({{site.baseurl}}/docs/sort-based-and-hash-based-memory-constrained-operators/)
 for more information.  
+Introduced in Drill 1.11. The list of directories into which the Sort, Hash 
Aggregate, and Hash Join operators spill data. The list must be an array with 
directories separated by a comma, for example ["/fs1/drill/spill" , 
"/fs2/drill/spill" , "/fs3/drill/spill"]. This is the recommended option for 
spilling to multiple directories. The default setting is ["/tmp/drill/spill"]. 
See [Sort-Based and Hash-Based Memory Constrained 
Operators]({{site.baseurl}}/docs/sort-based-and-hash-based-memory-constrained-operators/)
 for more information.  
 
 * **drill.exec.zk.connect**  
   Provides Drill with the ZooKeeper quorum to use to connect to data sources. 
Change this setting to point to the ZooKeeper quorum that you want Drill to 
use. You must configure this option on each Drillbit node.  

http://git-wip-us.apache.org/repos/asf/drill/blob/ccd89314/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
----------------------------------------------------------------------
diff --git 
a/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
 
b/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
index 999e026..315150c 100644
--- 
a/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
+++ 
b/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
@@ -1,91 +1,111 @@
 ---
 title: "Sort-Based and Hash-Based Memory-Constrained Operators"
-date: 2017-08-18 17:48:11 UTC
+date: 2018-03-14 00:58:06 UTC
 parent: "Query Plans and Tuning"
 --- 
 
-Drill uses hash-based and sort-based operators depending on the query 
characteristics. Hash Aggregate and Hash Join are hash-based operators. Sort, 
Streaming Aggregate, and Merge Join are sort-based operators. Both hash-based 
and sort-based operations consume memory, however the Hash Aggregate and Hash 
Join operators are the fastest and most memory intensive operators. 
+Drill uses operators to sort, join, and aggregate data when executing queries. 
Drill uses the Sort operator to sort data. Drill can use the Hash Aggregate or 
Hash Join operators to aggregate data, or Drill can sort the data and then use 
the Merge Join or Streaming Aggregate operators to aggregate the data. 
 
-When planning a query with sort- and hash-based operations, Drill evaluates 
the available memory multiplied by a configurable reduction constant (for 
parallelization purposes) and then limits the operations to the maximum of this 
amount of memory. Drill spills data to disk if the sort and hash aggregate 
operations cannot be performed in memory. Alternatively, you can disable large 
hash operations if they do not fit in memory on your system. When disabled, 
Drill creates alternative plans. You can also modify the minimum hash table 
size, increasing the size for very large aggregations or joins when you have 
large amounts of memory for Drill to use. If you have large data sets, you can 
increase the hash table size to improve performance. 
+The Hash operators typically perform better, however they are more memory 
intensive than the Merge Join and Streaming Aggregate operators. The Sort 
operator may use as much or even more memory than the Hash operators. If you 
want to see the difference in memory consumption between the operators, you can 
run a query and view the query profile in the Drill Web Console. Optionally, 
you can disable the Hash operators to force Drill to use the Merge Join and 
Streaming Aggregate operators. 
 
-##Memory Options
-The `planner.memory.max_query_memory_per_node` option sets the maximum amount 
of direct memory allocated to the Sort and Hash Aggregate operators during each 
query on a node. The default limit is set to 2147483648 bytes (2GB), which 
should be increased for queries on large data sets. This memory is split 
between operators. If a query plan contains multiple Sort and/or Hash Aggregate 
operators, the memory is divided between them.
+When a query requires sorting, joining, and aggregation, Drill equally divides 
the memory available among each instance of these memory intensive operators in 
a query. The number of instances is equivalent to the number of these operators 
in the query plan, each multiplied by its degree of parallelism. The degree of 
parallelism is the number of minor fragments required to perform the work for 
each instance of an operator. When an instance of an operator must process more 
data than it can hold, the operator temporarily spills some of the data to a 
directory on disk to complete its work.  
 
-When a query is parallelized, the number of operators is multiplied, which 
reduces the amount of memory given to each instance of the Sort and Hash 
Aggregate operators during a query. If you encounter memory issues when running 
queries with Sort and Hash Aggregate operators, calculate the memory 
requirements for your queries and the amount of available memory on each node. 
Based on the information, increase the value of the 
`planner.memory.max_query_memory_per_node` option using the ALTER 
SYSTEM|SESSION SET command, as shown:  
 
-    ALTER SYSTEM|SESSION SET `planner.memory.max_query_memory_per_node` = 
<new_value>  
-  
+##Spill to Disk  
 
-The `planner.memory.enable_memory_estimation` option toggles the state of 
memory estimation and re-planning of a query. When enabled, Drill 
conservatively estimates memory requirements and typically excludes 
memory-constrained operators from the query plan, which can negatively impact 
performance. The default setting is false. If you want Drill to use very 
conservative memory estimates, use the ALTER SYSTEM|SESSION SET command to 
change the setting, as shown:  
+Spilling to disk prevents queries that use memory intensive operations from 
failing with out-of-memory errors. The Spill to Disk feature enables the Sort, 
Hash Aggregate, and Hash Join operators to automatically write excess data (as 
files) to a temporary directory on disk when the memory requirements for the 
operators exceed the set memory limit. Queries run uninterrupted while the 
operators perform the spill operations in the background.
 
-    ALTER SYSTEM|SESSION SET `planner.memory.enable_memory_estimation` = true  
+When the Sort, Hash Aggregate, and Hash Join operators finish processing the 
data in memory, they read the spilled data back from disk and then finish 
processing the data. The operators clean up their data (files) from the 
temporary spill location after they finish processing the data. 
 
- 
-##Spill to Disk  
-Spilling data to disk prevents queries that use memory-intensive Sort and Hash 
Aggregate operations from failing with out-of-memory errors. Drill 
automatically writes excess data to a temporary directory on disk when queries 
with Sort or Hash Aggregate operations exceed the set memory limit on a Drill 
node. When the operators finish processing the in-memory data, Drill reads the 
spilled data back from disk, and the operators finish processing the data. When 
the operations complete, Drill removes the data from disk.  
+Ideally, you want to allocate enough memory for Drill to perform all 
operations in memory. When data spills to disk, you will not see any difference 
in terms of how queries run, however spilling to disk can impact performance 
due to the additional I/O required to write data to disk and read the data 
back. See Memory Allocation (page 4) for more information. 
 
-Spilling data to disk enables queries to run uninterrupted while Drill 
performs the spill operations in the background. However, there can be 
performance impact due to the time required to spill data and then read the 
data back from disk.  
+**Note:** Drill 1.13 and later supports spilling to disk for the Hash Join, 
Hash Aggregate, and Sort operators. Drill 1.11 and 1.12 supports spilling to 
disk for the Hash Aggregate and Sort operators. Releases of Drill prior to 1.11 
only support spilling to disk for the Sort operator.  
 
-{% include startnote.html %}Drill 1.11 and later supports spilling to disk for 
the Hash Aggregate operator in addition to the Sort operator. Previous releases 
of Drill only supported spilling to disk for the Sort operator.{% include 
endnote.html %}  
+**Spill Locations** 
 
-###Spill Locations  
-Drill writes data to a temporary work area on disk. The default location of 
the temporary work area is /tmp/drill/spill on the local file system. The 
/tmp/drill/spill directory should suffice for small workloads or examples, 
however it is highly recommended that you redirect the default spill location 
to a location with enough disk space to support spilling for large workloads.  
- 
-{% include startnote.html %}Spilled data may require more space than the table 
referenced in the query that is spilling the data. For example, if a table is 
100 GB per node, the spill directory should have the capacity to hold more than 
100 GB.{% include endnote.html %}
- 
-When you configure the spill location, you can specify a single directory, or 
a list of directories into which the sort and hash aggregate operators both 
spill. Alternatively, you can set specific spill directories for each type of 
operator, however this is not recommended as these options will be deprecated 
in future releases of Drill. For more information, see the Spill to Disk 
Configuration Options section below.  
+The Sort, Hash Aggregate, and Hash Join operators write data to a temporary 
work area on disk when they cannot process all of the data in memory. The 
default location of the temporary work area is /tmp/drill/spill on the local 
file system. 
 
-###Spill to Disk Configuration Options  
-The options related to spilling reside in the drill-override.conf file on each 
Drill node. An administrator or someone familiar with storage and disks should 
manage these settings.
+The /tmp/drill/spill directory should suffice for small workloads or examples, 
however it is highly recommended that you redirect the default spill location 
to a location with enough disk space to support spilling for large workloads.
 
-{% include startnote.html %}You can see examples of these configuration 
options in the drill-override-example.conf file located in the 
<drill_installation>/conf directory.{% include endnote.html %} 
+**Note:** Spilled data may require more space than the table referenced in the 
query that is spilling the data. For example, if a table is 100 GB per node, 
the spill directory should have the capacity to hold more than 100 GB.
 
-The following list describes the configuration options for spilling data to 
disk:  
+When you configure the spill location, you can specify a single directory or a 
list of directories into which the Sort, Hash Aggregate, and Hash Join 
operators spill data. For more information, see the Spill to Disk Configuration 
Options section below.  
 
-* **drill.exe.spill.fs**  
-Introduced in Drill 1.11. The default file system on the local machine into 
which the Sort and Hash Aggregate operators spill data. This is the recommended 
option to use for spilling. You can configure this option so that data spills 
into a distributed file system, such as hdfs. For example, "hdfs:///". The 
default setting is "file:///".  
-  
-* **drill.exec.spill.directories**  
-Introduced in Drill 1.11. The list of directories into which the Sort and Hash 
Aggregate operators spill data. The list must be an array with directories 
separated by a comma, for example ["/fs1/drill/spill" , "/fs2/drill/spill" , 
"/fs3/drill/spill"]. This is the recommended option for spilling to multiple 
directories. The default setting is ["/tmp/drill/spill"].  
-  
-* **drill.exec.sort.external.spill.fs**    
-Overrides the default location into which the Sort operator spills data. 
Instead of spilling into the location set by the `drill.exec.spill.fs` option, 
the Sort operators spill into the location specified by this option.  
-**Note:** As of Drill 1.11, this option is supported for backward 
compatibility, however in future releases, this option will be deprecated. It 
is highly recommended that you use the `drill.exec.spill.fs` option to set the 
spill location instead. The default setting is "file:///".  
+**Spill to Disk Configuration Options**  
 
-* **drill.exec.sort.external.spill.directories**   
-Overrides the location into which the Sort operator spills data. Instead of 
spilling into the location set by the `drill.exec.spill.directories` option, 
the Sort operators spill into the directories specified by this option. The 
list must be an array with directories separated by a comma, for example 
["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"].  
-**Note:** As of Drill 1.11, this option is supported for backward 
compatibility, however in future releases, this option will be deprecated. It 
is highly recommended that you use the `drill.exec.spill.directories` option to 
set the spill location instead. The default setting is ["/tmp/drill/spill"].  
- 
-* **drill.exec.hashagg.spill.fs**  
-Overrides the location into which the Hash Aggregate operator spills data. 
Instead of spilling into the location set by the `drill.exec.spill.fs` option, 
the Hash Aggregate operator spills into the location specified by this option. 
Setting this option to 1 disables spilling for the Hash Aggregate operator.  
-**Note:** As of Drill 1.11, this option is supported for backward 
compatibility, however in future releases, this option will be deprecated. It 
is highly recommended that you use the `drill.exec.spill.fs` option to set the 
spill location instead. The default setting is "file:///".  
-  
-* **drill.exec.hashagg.spill.directories**    
-Overrides the location into which the Hash Aggregate operator spills data. 
Instead of spilling into the location set by the `drill.exec.spill.directories` 
option, the Hash Aggregate operator spills to the directories specified by this 
option. The list must be an array with directories separated by a comma, for 
example ["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"].  
-**Note:** As of Drill 1.11, this option is supported for backward 
compatibility, however in future releases, this option will be deprecated. It 
is highly recommended that you use the `drill.exec.spill.directories option` to 
set the spill location instead.  
+The drill-override.conf file, located in the /conf directory, contains options 
that set the spill locations for the Hash and Sort operators. An administrator 
can change the file system and directories into which the operators spill data. 
Refer to the drill-override-example.conf file for examples. 
+
+The following list describes the spill to disk configuration options:  
+
+- **drill.exe.spill.fs**  
+Introduced in Drill 1.11. The default file system on the local machine into 
which the Sort, Hash Aggregate, and Hash Join operators spill data. You can 
configure this option so that data spills into a distributed file system, such 
as hdfs. For example, "hdfs:///". The default setting is "file:///".
+- **drill.exec.spill.directories**  
+Introduced in Drill 1.11. The list of directories into which the Sort, Hash 
Aggregate, and Hash Join operators spill data. The list must be an array with 
directories separated by a comma, for example ["/fs1/drill/spill" , 
"/fs2/drill/spill" , "/fs3/drill/spill"]. The default setting is 
["/tmp/drill/spill"].  
+
+**Note:** The following options were available prior to Drill 1.11, but have 
since been deprecated and replaced with the options described above:  
+
+- Drill.exec.sort.external.spill.fs (Replaced by drill.exec.spill.fs)
+- Drill.exec.sort.external.spill.directories (Replaced by 
drill.exec.spill.directories)
+- Drill.exec.hashagg.spill.fs (Replaced by drill.exec.spill.fs)  
+
+
+##Memory Allocation  
+
+Drill evenly splits the available memory among all instances of the Sort, Hash 
Aggregate, and Hash Join operators. When a query is parallelized, the number of 
operators is multiplied, which reduces the amount of memory given to each 
instance of the operators during a query.  
 
+**Memory Allocation Configuration Options**  
 
-##Hash-Based Operator Configuration Settings
-Use the ALTER SYSTEM|SESSION SET commands with the options below to disable 
the Hash Aggregate and Hash Join operators, modify the hash table size, or 
disable memory estimation. Typically, you set the options at the session level 
unless you want the setting to persist across all sessions.
+The `planner.memory.max_query_memory_per_node` and 
`planner.memory.percent_per_query` options set the amount of memory that Drill 
can allocate to a query on a node. Both options are enabled by default. Of 
these two options, Drill picks the setting that provides the most memory.  
 
-The following options control the hash-based operators:
+- **planner.memory.max_query_memory_per_node**  
+The `planner.memory.max_query_memory_per_node` option, set at 2 GB by default, 
is the minimum amount of memory available to Drill per query on a node. The 
default of 2 GB typically allows between two and three concurrent queries to 
run when the JVM is configured to use 8 GB of direct memory (default). When the 
memory requirement for Drill increases, the default of 2GB is constraining. You 
must increase the amount of memory for queries to complete, unless the setting 
for the planner.memory.percent_per_query option allows for Drill to use more 
memory.
+- **planner.memory.percent_per_query**  
+Alternatively, the `planner.memory.percent_per_query` option sets the memory 
as a percentage of the total direct memory. For example, if the allocation is 
set to 10%, and the total direct memory is 128 GB, each query gets 
approximately 13 GB.  
 
-* **planner.enable_hashagg**  
-Enables or disables hash aggregation; otherwise, Drill does a sort-based 
aggregation. This option is enabled by default. The default, and recommended, 
setting is true. 
-The Hash Aggregate operator uses an uncontrolled amount of memory, up to 10 
GB, after which the operator runs out of memory. As of Drill 1.11, the Hash 
Aggregate operator can write to disk. 
+The percentage is calculated using the following formula:  
 
-* **planner.enable_hashjoin**  
-Enables or disables the memory hungry hash join. Drill assumes that a query 
will have adequate memory to complete and tries to use the fastest operations 
possible to complete the planned inner, left, right, or full outer joins using 
a hash table. The Hash Join operator uses an uncontrolled amount of memory, up 
to 10 GB, after which the operator runs out of memory. Currently, this operator 
does not write to disk. Disabling hash join allows Drill to manage arbitrarily 
large data in a small memory footprint. This option is enabled by default. The 
default setting is true.
+       (1 - non-managed allowance)/concurrency
 
-* **exec.min_hash_table_size**  
-Starting size for hash tables. Increase this setting based on the memory 
available to improve performance. The default setting for this option is 65536. 
The setting can range from 0 to 1073741824.
+The non-managed allowance is an assumed amount of system memory that 
non-managed operators will use. Non-managed operators do not spill to disk. The 
default non-managed allowance assumes 50% of the total system memory. And, the 
concurrency is the number of concurrent queries that may run. The default 
assumption is 10.
 
-* **exec.max\_hash\_table_size**  
-Ending size for hash tables. The default setting for this option is 
1073741824. The setting can range from 0 to 1073741824.
+Based on the default assumptions, the default value of 5% is calculated as 
follows:  
 
+       (1 - .50)/10 = 0.05  
 
+This value is only used when throttling is disabled. Setting the value to 0 
disables the option. You can increase or decrease the value, however you should 
set the percentage well below the JVM direct memory to account for the cases 
where Drill does not manage memory, such as for the less memory intensive 
operators.  
+
+**Increasing the Available Memory**  
+
+You can increase the amount of available memory to Drill using the ALTER 
SYSTEM|SESSION SET commands with the `planner.memory.max_query_memory_per_node` 
or `planner.memory.percent_per_query` options, as shown:  
+
+       ALTER SYSTEM|SESSION SET `planner.memory.max_query_memory_per_node` = 
<new_value>
+       //The default value is to 2147483648 bytes (2GB). 
+       
+       ALTER SYSTEM|SESSION SET `planner.memory.percent_per_query` = 
<new_value>
+       //The default value is 0.05.  
+
+##Disabling the Hash Operators  
+
+You can disable the Hash Aggregate and Hash Join operators. When you disable 
these operators, Drill creates alternative query plans that use the Sort 
operator and the Streaming Aggregate or the Merge Join operator. 
+
+Use the ALTER SYSTEM|SESSION SET commands with the following options to 
disable the Hash Aggregate and Hash Join operators. Typically, you set the 
options at the session level unless you want the setting to persist across all 
sessions. 
+
+The following options control the hash-based operators:  
+
+- **planner.enable_hashagg**  
+Enables or disables hash aggregation; otherwise, Drill does a sort-based 
aggregation. This option is enabled by default. The default, and recommended, 
setting is true. Prior to Drill 1.11, the Hash Aggregate operator used an 
uncontrolled amount of memory (up to 10 GB), after which the operator ran out 
of memory. As of Drill 1.11, the Hash Aggregate operator can write to disk.
+- **planner.enable_hashjoin**  
+Enables or disables hash joins. This option is enabled by default. Drill 
assumes that a query will have adequate memory to complete and tries to use the 
fastest operations possible Drill 1.11, the Hash Join operator used an 
uncontrolled amount of memory (up to 10 GB), after which the operator ran out 
of memory. As of Drill 1.13, this operator can write to disk. This option is 
enabled by default.
+
+
+
+
+
+
+ 
   
 
 
 
 
+

http://git-wip-us.apache.org/repos/asf/drill/blob/ccd89314/team.md
----------------------------------------------------------------------
diff --git a/team.md b/team.md
index 94583d6..8a039d1 100755
--- a/team.md
+++ b/team.md
@@ -43,4 +43,5 @@ We welcome contributions to the project. If you're interested 
in contributing, t
 | Anil Kumar Batchu | akumarb2010 |  
 | Vitalii Diravka  | vitalii |  
 | Kamesh Bhallamudi | kameshb |  
+| Kunal Khatua | kunal |
 

Reply via email to