Taewoo Kim has posted comments on this change.

Change subject: ASTERIXDB-1566, ASTERIXDB-1733: Hash Group By and Hash Join 
conform to the memory budget
......................................................................


Patch Set 60:

(17 comments)

Thanks Yingyi.

https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/ExternalGroupByPOperator.java
File 
hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/ExternalGroupByPOperator.java:

Line 330:         BigInteger groupByPrimeNumberTableSize = 
BigInteger.valueOf(finalGroupByTableCardinality).nextProbablePrime();
> nextProbablePrime is very expensive.  I tried on my local machine and it us
It was Mike's idea that I agreed. I just talked to Mike and he agreed. This is 
removed.


https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java:

Line 60:     private static final int MIN_OUTPUT_FRAME_LIMT = 1;
> MIN_OUTPUT_FRAME_LIMIT =1
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalGroupOperatorDescriptor.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalGroupOperatorDescriptor.java:

Line 70:              * Minimum of 4 frames: 1 for input records, 1 for output, 
and 2 for hash table (1 header and 1 content)
> If minimum is 4, the condition should frameLimit <= 3.  Correct?
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/InMemoryHashJoin.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/InMemoryHashJoin.java:

Line 132:                         "Can't insert an entry into hash table. 
Please assign more memory to InMemoryHashJoin.");
> The error message is a bit hard to follow.
As we both know, the old contract works because it ignores the hash table 
footprint in memory. The contract is not changed. I just added one more safety 
check here.


Line 211:         LOGGER.fine("InMemoryHashJoin has finished using " + nFrames 
+ " frames for Thread ID "
> Check logging level so that we don't always do string concatenation.
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoin.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoin.java:

Line 310:         if (freeSpace <= 0) {
> <=0  -->  <0 ?
Done


Line 336:                     long expectedHashTableSizeDecrease = 
hashTableByteSizeForInMemTuples
> you can calculate the expectedHashTableSizeDecrease without inMemTupCount.
Since ceil() is used to calculate the total number of frames that are required 
for the given hash table, slightly changing the number can affects the number 
of frames.


Line 342:                     if (freeSpace > 0) {
> freeSpace > 0  -->  freeSpace >= 0 ?
Done


Line 405:     private int selectPartitionsToReload(long freeSpace, int pid, int 
inMemTupCount, long originalHashTableSize) {
> Why do you need 
Since ceil() is used to calculate the total number of frames that are required 
for the given hash table, slightly changing the number can affects the number 
of frames.


Line 412:             if (freeSpace > buildRFWriters[i].getFileSize() + 
expectedHashTableSizeIncrease) {
> > -->  >= ?
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/SimpleSerializableHashTable.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/SimpleSerializableHashTable.java:

Line 490:             bytes[offset++] = (byte) (value);
> Address this?  Remove the last ++.
Done


Line 490:             bytes[offset++] = (byte) (value);
> MAJOR SonarQube violation:
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java
File 
hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java:

Line 180:         int frameSize = 256;
> Is 50 too large?
Header: 4 pages, Content: 101 * 40 = 4040 = 15.78 = 16 pages
Changed it to 23.


Line 189:         int tableSize = 101;
> Is 50 too large?
Done


Line 192:         int minDataSize = frameSize * 80;
> Is 80 too large?
Adjusted to 40 after changing the frame count to 23.


Line 202:         int frameSize = 256;
> Is 50 too large?
Done


Line 203:         int minDataSize = frameSize * 80;
> Is 80 too large?
Adjusted to 40 after changing the frame count to 23.


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1056
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b323e9a2141b4c1dd1652a360d2d9354d3bc3f5
Gerrit-PatchSet: 60
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Jianfeng Jia <[email protected]>
Gerrit-Reviewer: Michael Blow <[email protected]>
Gerrit-Reviewer: Steven Jacobs <[email protected]>
Gerrit-Reviewer: Taewoo Kim <[email protected]>
Gerrit-Reviewer: Till Westmann <[email protected]>
Gerrit-Reviewer: Wenhai Li <[email protected]>
Gerrit-Reviewer: Yingyi Bu <[email protected]>
Gerrit-HasComments: Yes

Reply via email to