Jianfeng Jia has posted comments on this change. Change subject: ASTERIXDB-1566: Fixed External Hash Group By to conform to the memory budget ......................................................................
Patch Set 5: (14 comments) https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-app/src/main/resources/asterix-build-configuration.xml File asterixdb/asterix-app/src/main/resources/asterix-build-configuration.xml: Line 71: <value>64KB</value> should the user care about the internal hashtable memory limit? I thought it could be automatically changed. https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/AsterixCompilerProperties.java File asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/AsterixCompilerProperties.java: Line 44: private static final long COMPILER_EXTERNAL_GROUP_TABLE_SIZE_DEFAULT = 16 << 20; // 16MB just want to clarify, is this the cardinality value? or the physical bytes? https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-docker/docker/asterix-configuration.xml File asterixdb/asterix-docker/docker/asterix-configuration.xml: Line 239: <name>compiler.groupmemory</name> same as Chen's thought, I think the `groupbymemory` would be more meaningful. Line 250: In an external hash group-by operation, a hash table is used to insert/traverse the group-by value If I were the user, I would not read a description like this long :-) https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IPartitionedTupleBufferManager.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IPartitionedTupleBufferManager.java: Line 30: // Return the number of entire frames that are allocated to this buffer manager. use JavaDoc for Interface? https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java: Line 52: // If we couldn't find an already spilled partition, or it is too small to flush that one, personal opition: a large chunk of comments inside the code breaks the code readability. It's will be very hard for readers to navigate the actual code. It's better to move these two chunks on top of the function as a function description. https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/VPartitionTupleBufferManager.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/VPartitionTupleBufferManager.java: Line 50: // total number of frames that are allocated code tells the comment. Line 175: } else { don't need `else`? https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java: Line 68: int expectedByteSizeOfHashTableForGroupBy = SerializableHashTable.getUnitSize() should it be convenient to have a corresponding function inside the `SerializableHashTable`? Line 179: if (isUsedNumFramesExceedBudget()) { it's a ideal time to check :-) Line 254: if (getNumFrames() <= framesLimit) { `return getNumFrames() > framesLimit` ? https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalHashGroupBy.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalHashGroupBy.java: Line 64: // If the result type is SUCCESS_BUT_EXCEEDS_BUDGET, Again, the large chunk of comment breaks the code flow. It's better to merge them into as one paragraph and add it on top the function. Line 74: // FAIL case - try to insert again code tells. https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java: Line 179: // With the above table size, there can be 1001 * 40 / 256 (= 156) number of header frames in Hash Table. this is the function description. -- To view, visit https://asterix-gerrit.ics.uci.edu/1056 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2b323e9a2141b4c1dd1652a360d2d9354d3bc3f5 Gerrit-PatchSet: 5 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo Kim <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: Wenhai Li <[email protected]> Gerrit-Reviewer: Yingyi Bu <[email protected]> Gerrit-HasComments: Yes
