This is an automated email from the ASF dual-hosted git repository.

mhubail pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/asterixdb.git

commit c402e3a79f6320c72898424745fdab1e4da57405
Author: Dmitry Lychagin <[email protected]>
AuthorDate: Fri Oct 30 11:01:33 2020 -0700

    [NO ISSUE][DOC] Document hash and hash-bcast hints
    
    - user model changes: no
    - storage format changes: no
    - interface changes: no
    
    Details:
    - Add documentation for hash and hash-bcast hints
    
    Change-Id: If36f35c994c05cb244d5c4a08fc75680d1251920
    Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/8604
    Integration-Tests: Jenkins <[email protected]>
    Tested-by: Jenkins <[email protected]>
    Reviewed-by: Dmitry Lychagin <[email protected]>
    Reviewed-by: Till Westmann <[email protected]>
---
 asterixdb/asterix-doc/pom.xml                      |  2 +-
 .../asterix-doc/src/main/markdown/sqlpp/0_toc.md   |  1 +
 .../src/main/markdown/sqlpp/appendix_2_hints.md    | 52 ++++++++++++++++++++++
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/asterixdb/asterix-doc/pom.xml b/asterixdb/asterix-doc/pom.xml
index ea23f5e..e1a1e69 100644
--- a/asterixdb/asterix-doc/pom.xml
+++ b/asterixdb/asterix-doc/pom.xml
@@ -52,7 +52,7 @@
             <configuration>
               <target>
                 <concat 
destfile="${project.build.directory}/generated-site/markdown/sqlpp/manual.md">
-                  <filelist dir="${project.basedir}/src/main/markdown/sqlpp" 
files="0_toc.md,1_intro.md,2_expr_title.md,2_expr.md,3_query_title.md,3_declare_dataverse.md,3_declare_function.md,3_query.md,4_error_title.md,4_error.md,5_ddl_head.md,5_ddl_dataset_index.md,5_ddl_function_removal.md,5_ddl_dml.md,appendix_1_title.md,appendix_1_keywords.md,appendix_2_title.md,appendix_2_parameters.md,appendix_2_parallel_sort.md,appendix_2_index_only.md,appendix_3_title.md,appendix_3_resolution.md"
 />
+                  <filelist dir="${project.basedir}/src/main/markdown/sqlpp" 
files="0_toc.md,1_intro.md,2_expr_title.md,2_expr.md,3_query_title.md,3_declare_dataverse.md,3_declare_function.md,3_query.md,4_error_title.md,4_error.md,5_ddl_head.md,5_ddl_dataset_index.md,5_ddl_function_removal.md,5_ddl_dml.md,appendix_1_title.md,appendix_1_keywords.md,appendix_2_title.md,appendix_2_parameters.md,appendix_2_parallel_sort.md,appendix_2_index_only.md,appendix_2_hints.md,appendix_3_title.md,appe
 [...]
                 </concat>
                 <concat 
destfile="${project.build.directory}/generated-site/markdown/sqlpp/builtins.md">
                   <filelist 
dir="${project.basedir}/src/main/markdown/builtins" 
files="0_toc.md,0_toc_sqlpp.md,0_toc_common.md,1_numeric_common.md,1_numeric_delta.md,2_string_common.md,2_string_delta.md,3_binary.md,4_spatial.md,5_similarity.md,6_tokenizing.md,7_temporal.md,7_allens.md,8_record.md,9_aggregate_sql.md,10_comparison.md,11_type.md,13_conditional.md,12_misc.md,15_bitwise.md,14_window.md"
 />
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/0_toc.md 
b/asterixdb/asterix-doc/src/main/markdown/sqlpp/0_toc.md
index 1a5a3e6..8680c43 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/0_toc.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/0_toc.md
@@ -99,4 +99,5 @@
 * [Appendix 2. Performance Tuning](#Performance_tuning)
       * [Parallelism Parameter](#Parallelism_parameter)
       * [Memory Parameters](#Memory_parameters)
+      * [Query Hints](#Query_hints)
 * [Appendix 3. Variable Bindings and Name 
Resolution](#Variable_bindings_and_name_resolution)
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/appendix_2_hints.md 
b/asterixdb/asterix-doc/src/main/markdown/sqlpp/appendix_2_hints.md
new file mode 100644
index 0000000..0e4f470
--- /dev/null
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/appendix_2_hints.md
@@ -0,0 +1,52 @@
+<!--
+ ! Licensed to the Apache Software Foundation (ASF) under one
+ ! or more contributor license agreements.  See the NOTICE file
+ ! distributed with this work for additional information
+ ! regarding copyright ownership.  The ASF licenses this file
+ ! to you under the Apache License, Version 2.0 (the
+ ! "License"); you may not use this file except in compliance
+ ! with the License.  You may obtain a copy of the License at
+ !
+ !   http://www.apache.org/licenses/LICENSE-2.0
+ !
+ ! Unless required by applicable law or agreed to in writing,
+ ! software distributed under the License is distributed on an
+ ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ! KIND, either express or implied.  See the License for the
+ ! specific language governing permissions and limitations
+ ! under the License.
+ !-->
+
+## <a id="Query_hints">Query Hints</a>
+
+#### <a id="hash_groupby">"hash" GROUP BY hint</a>
+
+The system supports two algorithms for GROUP BY clause evaluation: pre-sorted 
and hash-based.
+By default it uses the pre-sorted approach: The input data is first sorted on 
the grouping fields
+and then aggregation is performed on that sorted data. The alternative is a 
hash-based strategy
+which can be enabled via a `/*+ hash */` GROUP BY hint: The data is aggregated 
using an in-memory hash-table
+(that can spill to disk if necessary). This approach is recommended for 
low-cardinality grouping fields.
+
+##### Example:
+
+    SELECT c.address.state, count(*)
+    FROM Customers AS c
+    /*+ hash */ GROUP BY c.address.state
+
+#### <a id="hash_bcast_join">"hash-bcast" JOIN hint</a>
+
+By default the system uses a partitioned-parallel hash join strategy to 
parallelize the execution of an
+equi-join. In this approach both sides of the join are repartitioned (if 
necessary) on a hash of the join key;
+potentially matching data items thus arrive at the same partition to be joined 
locally.
+This strategy is robust, but not always the fastest when one of the join sides 
is low cardinality and
+the other is high cardinality (since it scans and potentially moves the data 
from both sides).
+This special case can be better handled by broadcasting (replicating) the 
smaller side to all data partitions
+of the larger side and not moving the data from the other (larger) side. The 
system provides a join hint to enable
+this strategy: `/*+ hash-bcast */`. This hint forces the right side of the 
join to be replicated while the left side
+retains its original partitioning.
+
+##### Example:
+
+    SELECT *
+    FROM Orders AS o JOIN Customers AS c
+    ON o.customer_id /*+ hash-bcast */ = c.customer_id

Reply via email to