Shiva Jahangiri has uploaded this change for review. ( 
https://asterix-gerrit.ics.uci.edu/3350


Change subject: [NO ISSUE] Changed the minimum number of partitions in 
optimized hybrid hash join - user model changes: no - storage format changes : 
no - interface changes: no
......................................................................

[NO ISSUE] Changed the minimum number of partitions in optimized hybrid hash 
join
- user model changes: no
- storage format changes : no
- interface changes: no

Details:
As asterixdb currently does not have statistics, the formula for calculating 
the number of partitions for hybrid hash join does not give an accurate answer. 
Also, the minimum number of the partitions to make is set to 2. However, in 
case of having a big build relation, we may end up with 2 big partitions which 
can cause several rounds of recursions. Simulations showed that setting 20 as 
the minimum number of partitions can save more data in the memory and reduce 
the rounds of recursions as each partition now holds less amount of data.

Change-Id: I0a92acbe43761121e9851a4f792f561d71eb9f61
---
M 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoinOperatorDescriptor.java
1 file changed, 4 insertions(+), 4 deletions(-)



  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb 
refs/changes/50/3350/1

diff --git 
a/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoinOperatorDescriptor.java
 
b/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoinOperatorDescriptor.java
index 403c492..4acbd50 100644
--- 
a/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoinOperatorDescriptor.java
+++ 
b/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoinOperatorDescriptor.java
@@ -202,7 +202,7 @@
     private int getNumberOfPartitions(int memorySize, int buildSize, double 
factor, int nPartitions)
             throws HyracksDataException {
         int numberOfPartitions = 0;
-        if (memorySize <= 2) {
+        if (memorySize <= 20) {
             throw new HyracksDataException("Not enough memory is available for 
Hybrid Hash Join.");
         }
         if (memorySize > buildSize * factor) {
@@ -210,13 +210,13 @@
             // We set 2 (not 1) to avoid a corner case where the only 
partition may be spilled to the disk.
             // This may happen since this formula doesn't consider the hash 
table size. If this is the case,
             // we will do a nested loop join after some iterations. But, this 
is not effective.
-            return 2;
+            return 20;
         }
         numberOfPartitions = (int) (Math.ceil((buildSize * factor / 
nPartitions - memorySize) / (memorySize - 1)));
-        numberOfPartitions = Math.max(2, numberOfPartitions);
+        numberOfPartitions = Math.max(20, numberOfPartitions);
         if (numberOfPartitions > memorySize) {
             numberOfPartitions = (int) Math.ceil(Math.sqrt(buildSize * factor 
/ nPartitions));
-            return Math.max(2, Math.min(numberOfPartitions, memorySize));
+            return Math.max(20, Math.min(numberOfPartitions, memorySize));
         }
         return numberOfPartitions;
     }

--
To view, visit https://asterix-gerrit.ics.uci.edu/3350
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0a92acbe43761121e9851a4f792f561d71eb9f61
Gerrit-Change-Number: 3350
Gerrit-PatchSet: 1
Gerrit-Owner: Shiva Jahangiri <[email protected]>

Reply via email to