Taewoo Kim has posted comments on this change. Change subject: ASTERIXDB-1566, ASTERIXDB-1733: Hash Group By and Hash Join conform to the memory budget ......................................................................
Patch Set 60: (17 comments) Thanks Yingyi. https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/ExternalGroupByPOperator.java File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/ExternalGroupByPOperator.java: Line 330: BigInteger groupByPrimeNumberTableSize = BigInteger.valueOf(finalGroupByTableCardinality).nextProbablePrime(); > nextProbablePrime is very expensive. I tried on my local machine and it us It was Mike's idea that I agreed. I just talked to Mike and he agreed. This is removed. https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java: Line 60: private static final int MIN_OUTPUT_FRAME_LIMT = 1; > MIN_OUTPUT_FRAME_LIMIT =1 Done https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalGroupOperatorDescriptor.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalGroupOperatorDescriptor.java: Line 70: * Minimum of 4 frames: 1 for input records, 1 for output, and 2 for hash table (1 header and 1 content) > If minimum is 4, the condition should frameLimit <= 3. Correct? Done https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/InMemoryHashJoin.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/InMemoryHashJoin.java: Line 132: "Can't insert an entry into hash table. Please assign more memory to InMemoryHashJoin."); > The error message is a bit hard to follow. As we both know, the old contract works because it ignores the hash table footprint in memory. The contract is not changed. I just added one more safety check here. Line 211: LOGGER.fine("InMemoryHashJoin has finished using " + nFrames + " frames for Thread ID " > Check logging level so that we don't always do string concatenation. Done https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoin.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoin.java: Line 310: if (freeSpace <= 0) { > <=0 --> <0 ? Done Line 336: long expectedHashTableSizeDecrease = hashTableByteSizeForInMemTuples > you can calculate the expectedHashTableSizeDecrease without inMemTupCount. Since ceil() is used to calculate the total number of frames that are required for the given hash table, slightly changing the number can affects the number of frames. Line 342: if (freeSpace > 0) { > freeSpace > 0 --> freeSpace >= 0 ? Done Line 405: private int selectPartitionsToReload(long freeSpace, int pid, int inMemTupCount, long originalHashTableSize) { > Why do you need Since ceil() is used to calculate the total number of frames that are required for the given hash table, slightly changing the number can affects the number of frames. Line 412: if (freeSpace > buildRFWriters[i].getFileSize() + expectedHashTableSizeIncrease) { > > --> >= ? Done https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/SimpleSerializableHashTable.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/SimpleSerializableHashTable.java: Line 490: bytes[offset++] = (byte) (value); > Address this? Remove the last ++. Done Line 490: bytes[offset++] = (byte) (value); > MAJOR SonarQube violation: Done https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java: Line 180: int frameSize = 256; > Is 50 too large? Header: 4 pages, Content: 101 * 40 = 4040 = 15.78 = 16 pages Changed it to 23. Line 189: int tableSize = 101; > Is 50 too large? Done Line 192: int minDataSize = frameSize * 80; > Is 80 too large? Adjusted to 40 after changing the frame count to 23. Line 202: int frameSize = 256; > Is 50 too large? Done Line 203: int minDataSize = frameSize * 80; > Is 80 too large? Adjusted to 40 after changing the frame count to 23. -- To view, visit https://asterix-gerrit.ics.uci.edu/1056 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2b323e9a2141b4c1dd1652a360d2d9354d3bc3f5 Gerrit-PatchSet: 60 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo Kim <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Michael Blow <[email protected]> Gerrit-Reviewer: Steven Jacobs <[email protected]> Gerrit-Reviewer: Taewoo Kim <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: Wenhai Li <[email protected]> Gerrit-Reviewer: Yingyi Bu <[email protected]> Gerrit-HasComments: Yes
