Taewoo Kim has posted comments on this change. Change subject: ASTERIXDB-1566, ASTERIXDB-1628: Fixed External Hash Group By to conform to the memory budget ......................................................................
Patch Set 9: (25 comments) https://asterix-gerrit.ics.uci.edu/#/c/1056/9/asterixdb/asterix-docker/docker/asterix-configuration.xml File asterixdb/asterix-docker/docker/asterix-configuration.xml: Line 251: more efficiently using pointers. Actual Group-by value is stored in data table. > "Group-by value" -> "aggregate values" Per our discussion, this parameter will not be used. Line 253: the same as the given value divided by 40 bytes. (E.g., 16MB / 40 = 419,430). > Can you add the explanation of why 40 is the number into the comments? Per our discussion, this parameter will not be used. https://asterix-gerrit.ics.uci.edu/#/c/1056/9/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java: Line 91: final ITuplePartitionComputer tpc1 = new FieldHashPartitionComputerFamily(intermediateResultKeys, > better variable name. e.g., Done https://asterix-gerrit.ics.uci.edu/#/c/1056/9/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/ISpillableTable.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/ISpillableTable.java: Line 55: * @return > annotate @return. Done https://asterix-gerrit.ics.uci.edu/#/c/1056/9/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalHashGroupBy.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalHashGroupBy.java: Line 79: result = table.insert(accessor, i); > flip the else branch as the if branch, and hence not nest more than 4 level Done https://asterix-gerrit.ics.uci.edu/#/c/1056/9/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/ISerializableTable.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/ISerializableTable.java: Line 43: boolean isGarbageCollectioNeeded(); > This method is never called. Do we really need it? Actually, this will be called during a partition spill to the disk from the caller due to a budget exceed. Line 46: throws HyracksDataException; > Why does this need to be expose to the outside? Because we do a garbage collection need check when we spill a data partition to the disk, not after each insertion. So, if the check says the garbage collection is required, we need to have a way to execute the garbage collection. https://asterix-gerrit.ics.uci.edu/#/c/1056/9/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/SerializableHashTable.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/SerializableHashTable.java: Line 47: private static int INVALID_VALUE = -1; > 1. static final Done Line 262: lastContentFrame.writeInt(lastOffsetInCurrentFrame + i, INVALID_VALUE); > Simplify that using Arrays.fill(...). Done Line 384: for (int i = 0; i < frameCapacity; i++) > Simplify that using one line: Done Line 413: int numberOfHeaderFrame = (int) Math.ceil(tableSize * 2 / frameSize); > MAJOR SonarQube violation: Done Line 413: int numberOfHeaderFrame = (int) Math.ceil(tableSize * 2 / frameSize); > CRITICAL SonarQube violation: Done Line 413: int numberOfHeaderFrame = (int) Math.ceil(tableSize * 2 / frameSize); > Pls address this. Done Line 414: int numberOfContentFrame = (int) Math > Pls address this. Done Line 414: int numberOfContentFrame = (int) Math > MAJOR SonarQube violation: Done Line 415: .ceil((getNumberOfEntryInSlot() * 2 * getUnitSize() * tableSize) / frameSize); > CRITICAL SonarQube violation: Done Line 439: // For calculating the original hash value > wastedIntSpaceCount is not modified after executeGarbageCollection? Done Line 440: TuplePointer tmpTuplePointerForGC = new TuplePointer(); > reuse the object. Done Line 445: int slotCapacity = 0; > MAJOR SonarQube violation: Done Line 446: int slotUsedCount = 0; > MAJOR SonarQube violation: Done Line 447: int capacityInIntCount = 0; > MAJOR SonarQube violation: Done Line 462: slotCapacity = INVALID_VALUE; > MAJOR SonarQube violation: Done Line 463: slotUsedCount = INVALID_VALUE; > MAJOR SonarQube violation: Done Line 486: if ((currentWriteIntOffsetInPageForGC + 4) > frameCapacity > MAJOR SonarQube violation: Done Line 488: // Swipe the region that can't be used. > Using Arrays.fill(...) to simplify that. Done -- To view, visit https://asterix-gerrit.ics.uci.edu/1056 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2b323e9a2141b4c1dd1652a360d2d9354d3bc3f5 Gerrit-PatchSet: 9 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo Kim <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Michael Blow <[email protected]> Gerrit-Reviewer: Taewoo Kim <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: Wenhai Li <[email protected]> Gerrit-Reviewer: Yingyi Bu <[email protected]> Gerrit-HasComments: Yes
