Taewoo Kim has posted comments on this change.

Change subject: ASTERIXDB-1566: Fixed External Hash Group By to conform to the 
memory budget
......................................................................


Patch Set 5:

(14 comments)

Thanks Jianfeng!

https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-app/src/main/resources/asterix-build-configuration.xml
File asterixdb/asterix-app/src/main/resources/asterix-build-configuration.xml:

Line 71:     <value>64KB</value>
> should the user care about the internal hashtable memory limit?
This idea is actually based on Mike's idea and I agreed with this idea. In 
reality, this config file is changed by system-admin, not an ordinary user who 
sends a query to the system.


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/AsterixCompilerProperties.java
File 
asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/AsterixCompilerProperties.java:

Line 44:     private static final long 
COMPILER_EXTERNAL_GROUP_TABLE_SIZE_DEFAULT = 16 << 20; // 16MB
> just want to clarify, is this the cardinality value? or the physical bytes?
physical bytes. We will use this to calculate the cardinality in 
ApiFrameWork.compileQuery().


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-docker/docker/asterix-configuration.xml
File asterixdb/asterix-docker/docker/asterix-configuration.xml:

Line 239:     <name>compiler.groupmemory</name>
> same as Chen's thought, I think the `groupbymemory` would be more meaningfu
I agree. Then, we need to put alias for these two - groupmemory and 
groupbymemory.


Line 250:       In an external hash group-by operation, a hash table is used to 
insert/traverse the group-by value
> If I were the user, I would not read a description like this long :-)
Since we don't have a detailed reference manual for all settings, I tried put 
details here.


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IPartitionedTupleBufferManager.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IPartitionedTupleBufferManager.java:

Line 30:     // Return the number of entire frames that are allocated to this 
buffer manager.
> use JavaDoc for Interface?
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java:

Line 52:         // If we couldn't find an already spilled partition, or it is 
too small to flush that one,
> personal opition: a large chunk of comments inside the code breaks the code
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/VPartitionTupleBufferManager.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/VPartitionTupleBufferManager.java:

Line 50:     // total number of frames that are allocated
> code tells the comment.
Done


Line 175:         } else {
> don't need `else`?
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java:

Line 68:         int expectedByteSizeOfHashTableForGroupBy = 
SerializableHashTable.getUnitSize()
> should it be convenient to have a corresponding function inside the `Serial
Done


Line 179:                 if (isUsedNumFramesExceedBudget()) {
> it's a ideal time to check :-)
Agreed. :-)


Line 254:                 if (getNumFrames() <= framesLimit) {
> `return getNumFrames() > framesLimit` ?
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalHashGroupBy.java
File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalHashGroupBy.java:

Line 64:                         // If the result type is 
SUCCESS_BUT_EXCEEDS_BUDGET,
> Again, the large chunk of comment breaks the code flow. It's better to merg
Done


Line 74:                         // FAIL case - try to insert again
> code tells.
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java
File 
hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java:

Line 179:         // With the above table size, there can be 1001 * 40 / 256 (= 
156) number of header frames in Hash Table.
> this is the function description.
Done


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1056
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b323e9a2141b4c1dd1652a360d2d9354d3bc3f5
Gerrit-PatchSet: 5
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Jianfeng Jia <[email protected]>
Gerrit-Reviewer: Taewoo Kim <[email protected]>
Gerrit-Reviewer: Till Westmann <[email protected]>
Gerrit-Reviewer: Wenhai Li <[email protected]>
Gerrit-Reviewer: Yingyi Bu <[email protected]>
Gerrit-HasComments: Yes

Reply via email to