Shiva Jahangiri has uploaded this change for review. (
https://asterix-gerrit.ics.uci.edu/3350
Change subject: [NO ISSUE] Changed the minimum number of partitions in
optimized hybrid hash join - user model changes: no - storage format changes :
no - interface changes: no
......................................................................
[NO ISSUE] Changed the minimum number of partitions in optimized hybrid hash
join
- user model changes: no
- storage format changes : no
- interface changes: no
Details:
As asterixdb currently does not have statistics, the formula for calculating
the number of partitions for hybrid hash join does not give an accurate answer.
Also, the minimum number of the partitions to make is set to 2. However, in
case of having a big build relation, we may end up with 2 big partitions which
can cause several rounds of recursions. Simulations showed that setting 20 as
the minimum number of partitions can save more data in the memory and reduce
the rounds of recursions as each partition now holds less amount of data.
Change-Id: I0a92acbe43761121e9851a4f792f561d71eb9f61
---
M
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoinOperatorDescriptor.java
1 file changed, 4 insertions(+), 4 deletions(-)
git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb
refs/changes/50/3350/1
diff --git
a/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoinOperatorDescriptor.java
b/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoinOperatorDescriptor.java
index 403c492..4acbd50 100644
---
a/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoinOperatorDescriptor.java
+++
b/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoinOperatorDescriptor.java
@@ -202,7 +202,7 @@
private int getNumberOfPartitions(int memorySize, int buildSize, double
factor, int nPartitions)
throws HyracksDataException {
int numberOfPartitions = 0;
- if (memorySize <= 2) {
+ if (memorySize <= 20) {
throw new HyracksDataException("Not enough memory is available for
Hybrid Hash Join.");
}
if (memorySize > buildSize * factor) {
@@ -210,13 +210,13 @@
// We set 2 (not 1) to avoid a corner case where the only
partition may be spilled to the disk.
// This may happen since this formula doesn't consider the hash
table size. If this is the case,
// we will do a nested loop join after some iterations. But, this
is not effective.
- return 2;
+ return 20;
}
numberOfPartitions = (int) (Math.ceil((buildSize * factor /
nPartitions - memorySize) / (memorySize - 1)));
- numberOfPartitions = Math.max(2, numberOfPartitions);
+ numberOfPartitions = Math.max(20, numberOfPartitions);
if (numberOfPartitions > memorySize) {
numberOfPartitions = (int) Math.ceil(Math.sqrt(buildSize * factor
/ nPartitions));
- return Math.max(2, Math.min(numberOfPartitions, memorySize));
+ return Math.max(20, Math.min(numberOfPartitions, memorySize));
}
return numberOfPartitions;
}
--
To view, visit https://asterix-gerrit.ics.uci.edu/3350
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0a92acbe43761121e9851a4f792f561d71eb9f61
Gerrit-Change-Number: 3350
Gerrit-PatchSet: 1
Gerrit-Owner: Shiva Jahangiri <[email protected]>