This is an automated email from the ASF dual-hosted git repository.

bridgetb pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/drill-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new a4acaa3  edits
a4acaa3 is described below

commit a4acaa3d3ef201193177e797ee3b1d4c755e0a5f
Author: Bridget Bevens <[email protected]>
AuthorDate: Thu Dec 20 18:24:08 2018 -0800

    edits
---
 .../index.html                                             | 14 ++++++++------
 feed.xml                                                   |  4 ++--
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git 
a/docs/sort-based-and-hash-based-memory-constrained-operators/index.html 
b/docs/sort-based-and-hash-based-memory-constrained-operators/index.html
index d26cfb8..5258fe0 100644
--- a/docs/sort-based-and-hash-based-memory-constrained-operators/index.html
+++ b/docs/sort-based-and-hash-based-memory-constrained-operators/index.html
@@ -1293,7 +1293,7 @@
 
     </div>
 
-     Dec 19, 2018
+     Dec 21, 2018
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
@@ -1308,7 +1308,7 @@
 </ul>
 
 <p>Drill only uses the External Sort operator to sort data. Drill uses the 
Hash-Aggregate operator to aggregate data. Alternatively, Drill can sort the 
data and then use the (lightweight) Streaming-Aggregate operator to aggregate 
data.
-Drill uses the Hash-Join operator to join data. Drill 1.15 introduces 
semi-join functionality inside the Hash-Join operator to improve query 
performance. Semi-joins remove the distinct processing below the Hash-Join and 
eliminate the overhead incurred from using a Hash Aggregate. Prior to Drill 
1.15 (or when <a 
href="/docs/sort-based-and-hash-based-memory-constrained-operators/#disabling-the-hash-operators">semi-join
 functionality is disabled</a>), Drill uses a Distinct Hash Aggregate to [...]
+Drill uses the Hash-Join operator to join data. Drill 1.15 introduces 
semi-join functionality inside the Hash-Join operator to improve query 
performance. Semi-joins remove the distinct processing below the Hash-Join and 
eliminate the overhead incurred from using a Hash Aggregate. Prior to Drill 
1.15 (or when <a 
href="/docs/sort-based-and-hash-based-memory-constrained-operators/#disabling-the-hash-operators">semi-join
 functionality is disabled</a>), Drill used a Distinct Hash Aggregate to [...]
 
 <p>The memory configuration in Drill is specified as the memory limit 
per-query, per-node. The allocated memory is equally divided among all 
instances of the spillable operators (per query on each node). The number of 
instances is the number of spillable operators in the query plan multiplied by 
the maximal degree of parallelism. The maximal degree of parallelism is the 
number of minor fragments required to perform the work for each instance of a 
spillable operator. When an instance of a [...]
 
@@ -1401,7 +1401,7 @@ Enables or disables hash aggregation; otherwise, Drill 
does a sort-based aggrega
 <li><p><strong>planner.enable_hashjoin</strong><br>
 Enables or disables hash joins. This option is enabled by default. Drill 
assumes that a query will have adequate memory to complete and tries to use the 
fastest operations possible. Prior to Drill 1.14, the Hash-Join operator used 
an uncontrolled amount of memory (up to 10 GB), after which the operator ran 
out of memory. As of Drill 1.14, this operator can spill to disk. This option 
is enabled by default.    </p></li>
 <li><p><strong>planner.enable_semijoin</strong><br>
-Enables or disables semi-joins. This option is enabled by default and only 
works when the <code>planner.enable_hashjoin</code> option is also enabled. 
When enabled, Drill uses semi-joins to remove the distinct processing below the 
Hash Join and sets the semi-join flag in the Hash Join flag, as shown in the 
following example:  </p></li>
+Enables or disables semi-join functionality inside the Hash Join. This option 
is enabled by default. When enabled, a semi-join flag inside the HashJoin flag 
is set to true, and Drill uses a semi-join to remove the distinct processing 
below the Hash Join. When disabled, Drill can still perform semi-joins, but the 
semi-joins are performed outside of the Hash Join, as shown in the following 
example:   </p></li>
 </ul>
 
 <h3 id="example-query-plan-with-and-without-semi-join">Example: Query Plan 
with and without Semi-Join</h3>
@@ -1425,9 +1425,11 @@ planner.enable_semijoin
 </code></pre></div>
 <p><strong>Semi-Join Enabled</strong><br>
 In the following query plan, you can see that the HashAgg is absent. In the 
HashJoin flag, you can see that semi-join flag is set to true, indicating that 
a semi-join was used. Using the semi-join optimizes the query by reducing the 
amount of processing that Drill must perform on data.   </p>
-<div class="highlight"><pre><code class="language-text" 
data-lang="text">EXPLAIN PLAN FOR SELECT employee_id, full_name FROM 
cp.`employee.json` WHERE employee_id IN (SELECT employee_id FROM 
cp.`employee.json`);
---------------------------------------------------------------------------------+
-|                                       text                                   
    |                                     
+----------------------------------------------------------------------------------+
+<div class="highlight"><pre><code class="language-text" 
data-lang="text">EXPLAIN PLAN FOR SELECT employee_id, full_name FROM 
cp.`employee.json` WHERE employee_id IN (SELECT employee_id FROM 
cp.`employee.json`);  
+
++----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
+|                                       text                                   
    |                                                            
++----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
 | 00-00    Screen
 00-01      Project(employee_id=[$0], full_name=[$1])
 00-02        HashJoin(condition=[=($0, $2)], joinType=[inner], semi-join: 
=[true])
diff --git a/feed.xml b/feed.xml
index 0d3ebf0..a850376 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
 </description>
     <link>/</link>
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Thu, 20 Dec 2018 15:39:02 -0800</pubDate>
-    <lastBuildDate>Thu, 20 Dec 2018 15:39:02 -0800</lastBuildDate>
+    <pubDate>Thu, 20 Dec 2018 18:21:08 -0800</pubDate>
+    <lastBuildDate>Thu, 20 Dec 2018 18:21:08 -0800</lastBuildDate>
     <generator>Jekyll v2.5.2</generator>
     
       <item>

Reply via email to