Taewoo Kim has posted comments on this change. Change subject: ASTERIXDB-1566: Fixed External Hash Group By to conform to the memory budget ......................................................................
Patch Set 5: (14 comments) Thanks Jianfeng! https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-app/src/main/resources/asterix-build-configuration.xml File asterixdb/asterix-app/src/main/resources/asterix-build-configuration.xml: Line 71: <value>64KB</value> > should the user care about the internal hashtable memory limit? This idea is actually based on Mike's idea and I agreed with this idea. In reality, this config file is changed by system-admin, not an ordinary user who sends a query to the system. https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/AsterixCompilerProperties.java File asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/AsterixCompilerProperties.java: Line 44: private static final long COMPILER_EXTERNAL_GROUP_TABLE_SIZE_DEFAULT = 16 << 20; // 16MB > just want to clarify, is this the cardinality value? or the physical bytes? physical bytes. We will use this to calculate the cardinality in ApiFrameWork.compileQuery(). https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-docker/docker/asterix-configuration.xml File asterixdb/asterix-docker/docker/asterix-configuration.xml: Line 239: <name>compiler.groupmemory</name> > same as Chen's thought, I think the `groupbymemory` would be more meaningfu I agree. Then, we need to put alias for these two - groupmemory and groupbymemory. Line 250: In an external hash group-by operation, a hash table is used to insert/traverse the group-by value > If I were the user, I would not read a description like this long :-) Since we don't have a detailed reference manual for all settings, I tried put details here. https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IPartitionedTupleBufferManager.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IPartitionedTupleBufferManager.java: Line 30: // Return the number of entire frames that are allocated to this buffer manager. > use JavaDoc for Interface? Done https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java: Line 52: // If we couldn't find an already spilled partition, or it is too small to flush that one, > personal opition: a large chunk of comments inside the code breaks the code Done https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/VPartitionTupleBufferManager.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/VPartitionTupleBufferManager.java: Line 50: // total number of frames that are allocated > code tells the comment. Done Line 175: } else { > don't need `else`? Done https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java: Line 68: int expectedByteSizeOfHashTableForGroupBy = SerializableHashTable.getUnitSize() > should it be convenient to have a corresponding function inside the `Serial Done Line 179: if (isUsedNumFramesExceedBudget()) { > it's a ideal time to check :-) Agreed. :-) Line 254: if (getNumFrames() <= framesLimit) { > `return getNumFrames() > framesLimit` ? Done https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalHashGroupBy.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalHashGroupBy.java: Line 64: // If the result type is SUCCESS_BUT_EXCEEDS_BUDGET, > Again, the large chunk of comment breaks the code flow. It's better to merg Done Line 74: // FAIL case - try to insert again > code tells. Done https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java: Line 179: // With the above table size, there can be 1001 * 40 / 256 (= 156) number of header frames in Hash Table. > this is the function description. Done -- To view, visit https://asterix-gerrit.ics.uci.edu/1056 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2b323e9a2141b4c1dd1652a360d2d9354d3bc3f5 Gerrit-PatchSet: 5 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo Kim <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Taewoo Kim <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: Wenhai Li <[email protected]> Gerrit-Reviewer: Yingyi Bu <[email protected]> Gerrit-HasComments: Yes
